" In a nutshell, every effort is made to ensure you cannot compare these with the servers of "Big Blue" or the x86 competition."
Of course not. If they did that it would interfere with their deceptive marketing campaign with the banner headline "An Oracle Box costing $$stupid is several times faster than an IBM box costing $$3xStupid"; where if you look up model dates you see they're comparing against a several year old IBM box against their latest and greatest. (I've never been bored enough to dig that deeply; but my inner cynic suspects that they're probably larding a bunch of expensive stuff that doesn't do anything for java onto the IBM box to inflate its price even more.)
The reason Oracle sometimes compares to an older IBM model, is because IBM has not released newer benchmarks. IBM does the same, for instance, IBM claims that, one z10 Mainframe with 64 sockets can replace 1.500 of the x86 servers. If you dig a bit, it turns out all x86 servers are like 1GHz Pentium3 with 256MB RAM or so - and they all idle. Yes, literally, all x86 servers idle, whereas the Mainframe is 100% loaded. What happens if some x86 servers starts to do some work? The Mainframe will choke. I can emulate a Mainframe on my laptop with open source emulator "TurboHercules", is it ok if I claim that my laptop can replace three IBM Mainframes (if they all idle)?
Regarding this Intel Xeon E7 cpu. Sure it is nice, but it has twice the number of cores as the competition. Another thing is that the largest x86 servers have 8-sockets. There are no larger x86 servers than that. The only 32 socket servers are Unix and Mainframes. Some Unix servers even have 64 sockets. Thus, the x86 does not scale above 8-sockets.
For scalability, you must distinguish between scale-out and scale-up. Scale-out is a cluster, just add a new node and you have increased scalability. Clusters are used for HPC number crunching workloads where you run a tight for loop on some independent data (ideally each node fits everything in the cpu cache). Some examples of Scale-out servers (clusters) are all servers on the Top-500 supercomputer list. Other examples are SGI Altix / UV2000 servers or the ScaleMP server, they have 10,000s of cores and 64 TB RAM or more, i.e. cluster. Sure, they run a single unified Linux kernel image - but they are still clusters. If you read a bit, you will see that the SGI servers are using MPI. And MPI are used on clusters for HPC number crunching.
Scale-up servers, are one single fat huge server. They might have 16 or 32 sockets, some even have 64 sockets! They weigh 1000 kg and costs many many millions. For instance the old IBM P595 Unix server for the old TPC-C record, has 32 sockets and costs $35 million (no typo). One single server with 32 cpus, costs $35 million. You will never ever see this prices on clusters. If you buy a SGI server with 100s of sockets, you will essentially pay the same price as buying individual nodes with the same nr of sockets. But scale-up servers, need heavy redesign and innovative scalability tech, and that is the reason a 16 or 32 socket server costs many many many more times than a SGI cluster having 100s of sockets. They are not in the same arena. These scale-up servers are typically used for SMP workloads, not HPC workloads. SMP workloads are typically large databases or Enterprise ERP workloads. This code is heavy branch intensive, so you can not fit into a cpu cache. It branches everywhere, and clusters can not run these Enterprise workloads because the performance would be very bad. If you need to run Enterprise workloads (where the big margin and big money is) you need to go to 32 socket servers. And they are all RISC or Mainframe servers. Examples are IBM P795, Oracle M6-32, Fujitisu M10-4S, HP Superdome/Integrity. They all run AIX, Solaris, HP-UX and they all have up to 32 sockets or 64 sockets. Some attempts have been made to compile Linux to these huge servers, but the results have been bad because Linux has problems scale above 8-sockets. The reason is the Linux kernel devs does not have access to 32 socket SMP server, because they dont exist, so how can Linux kernel be optimized for 32 sockets? Ted Tso, the famous Linux kernel developer writes: http://thunk.org/tytso/blog/2010/11/01/i-have-the-... "...Ext4 was always designed for the “common case Linux workloads/hardware”, and for a long time, 48 cores and large RAID arrays were in the category of “exotic, expensive hardware”, and indeed, for much of the ext2/3 development time, most of the ext2/3 developers didn’t even have access to such hardware...." Ted Tso considers servers with 48 cores in total, to be huge and out of reach for Linux developers. He is not talking about 48 socket servers, but 48 cores which is chicken shit in the mature Enterprise arena.
For instance the Big Tux HP server, compiled Linux to 64 socket HP integrity server with catastrophic results, the cpu utilization was ~40%, which means every other cpu idles under full load. Google on Big Tux and read it yourself.
There is a reason the huge Linux servers such as SGI UV2000 with 1000s of cores are so cheap in comparison to 16 socket or 32 socket Unix servers, and why the Linux servers are exclusively used for HPC number crunching workloads, and never SMP workloads:
SGI servers are only used for HPC clustered workloads, and never for SMP enterprise workloads: http://www.realworldtech.com/sgi-interview/6/ "Typically, much of the work in HPC scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this SMP market, at this point in time."
Same with the ScaleMP Linux server with 1000s of cores, is never used for SMP workloads: http://www.theregister.co.uk/2011/09/20/scalemp_su... "The vSMP hypervisor that glues systems together is not for every workload, but on workloads where there is a lot of message passing between server nodes – financial modeling, supercomputing, data analytics, and similar parallel workloads. Shai Fultheim, the company's founder and chief executive officer, says ScaleMP has over 300 customers now. "We focused on HPC as the low-hanging fruit."
The difficult thing is to scale well above 8 sockets. You can release one single strong cpu, which does not scale. To scale above 8-sockets are very difficult, ask Intel. Thus, this Intel Xeon E7 cpu are only used up to 8-sockets servers. For more oomph, you need 32 socket or even 64 sockets - Unix or Mainframes. SGI Linux servers can not replace these large Unix servers. And that is the reason Linux never will venture into the lucrative Enterprise arena, and never replace large Unix servers. The largest Linux servers capable of Enterprise SMP workloads are 8 sockets. The Linux clusters dont count.
Another reason why this Intel Xeon E7 can not touch the high end server market (beyond scalability limitations) is that the RAS is not good enough. RAS is very very expensive. For isntance, IBM Mainframes and high end SPARC cpus, can replay an instruction if it were an error. x86 can not do this. Some Mainframes have three cpus and compare every computation, and if there is an error, the failing cpu will shut down. This is very very expensive to create this tailor made hardware. It is easy to get good performance, just turn up the GHz up to unstability point. But can you rely on that hardware? No. Enterprise need reliability above else. You must trust your hardware. It is much better to have one slower reliable server, than a super fast cranked up GHz where some computations are false. No downtime! x86 can not do this. The RAS is lacking severly behind and will take decades before Intel can catch up on Unix or Mainframe servers. And at that point - the x86 cpus will be as expensive!
Thus: -Intel Xeon E7 does not scale above 8-sockets. Unix does. So you will never challenge the high end market where you need extreme performance. Besides, the largest Unix servers (Oracle) have 32TB RAM. Intel Xeon E7 has only 6TB RAM - which is nothing. So x86 does not scale cpu wise, nor RAM wise. -Intel Xeon E7 has no sufficient RAS, and the servers are unreliable, besides the x86 architecture which is inherently buggy and bad (some sysadmins would not touch a x86 server with a ten feet pole, and only use OpenVMS/Unix or Mainframe): http://www.anandtech.com/show/3593 -Oracle is much much much much cheaper than IBM POWER systems. The Oracle SPARC servers pricing is X for each cpu. So if you buy the largest M6-32 server with 32TB of RAM you pay 32 times X. Whereas IBM POWER systems costs more and more the more sockets you buy. If you buy 32 sockets, you pay much much much more than for 8 sockets.
Oracle will release a 96-socket SPARC server with up to 96TB RAM. It will be targeted for database work (not surprisingly as Oracle is mainly interested in Databases) and other SMP workloads. Intel x86 will never be able to replace such a huge monster. (Sure, there are clustered databases running on HPC servers, but they can not replace SMP databases). Look at the bottom pic, to see how all sockets are connected to each other in 32 socket configuration. There are only 2-3 hops to reach each node, which is very good. For HPC clusters, the worst case requires many many hops, which makes them unusable for SMP workloads http://www.theregister.co.uk/2013/08/28/oracle_spa...
32 sockets to run SMP workloads. " typically large databases or Enterprise ERP workloads". Sound like we are solving a problem with hardware instead of being innovative in software.
"Intel Xeon E7 has only 6TB RAM - which is nothing".
Dangerous comment. 12 TB is possible with an octal Xeon at a fraction of the cost of the unix boxes you talk about. 1 - 12 TB is enough for a massive part of the market, even a large part of the "lucrative" enterprise market.
I agree with you that there are some workloads which are out of the Xeon's league. But it is shrinking...each time a bit more.
"than a super fast cranked up GHz where some computations are false"
That is another bad statement without any proof.
"The RAS is lacking severly behind and will take decades before Intel can catch up on Unix or Mainframe servers. And at that point - the x86 cpus will be as expensive!"
Considering that the vast majority of problems is related to software (drivers inclusive), I doubt very much that even better RAS can make a big difference. A mature software stack is what make these monster servers reliable, the hardware plays a small role.
Secondly, Intel charges just as much as the market is willing to pay. They can spread the core development over much more CPUs than the RISC vendors, so chances are that they will never as expensive as the RISC vendors.
-- Sound like we are solving a problem with hardware instead of being innovative in software
Well, it depends on what one means by "innovation". The Kiddie Koders have been recreating the likes of IDMS & IMS (early to mid 1960s approaches), all with newer names by identical semantics and storage models. The way to leverage such machines, relational data is the answer. Minimum storage footprint, DRI, and such. Use SSD, and beat the crap out of these neer-do-well RBAR messes.
"Innovative software stacks" might imply something modern and better like immutable databases which are at the opposite end of the spectrum vs IMS placing relational databases inbetween. Read up http://engineering.linkedin.com/distributed-system... concrete examples of good paradigms would be Datomic as well as Event Store.
6TB or 12TB is not really interesting as we are entering the Large Data age. Oracle has 32TB today, and with compression you can run huge databases from RAM. And the 96-socket server will have 96TB RAM, which will run databases even faster. Databases are everything, they are at the heart of a company, without databases the company will halt. There are examples of companies not having a backup of their database going bankrupt when their database got wiped out because of a crash. The most important part of a company, is the database, the infromation.
I am trying to say that it is better to have a slow and 100% reliable server, than a fast overclocked server that is a bit unstable - for Enterprise customers. There are things that must not go down, no crashes allowed.
For large workloads, Oracle SPARC is the widening the gap to all other cpus, because Oracle is doubling performance every generation. Intel does not do that, nor does IBM. Try to benchmark a 8-socket x86 server against the Oracle 32-socket SPARC M5-32 monster server. Or against the Fujitsu 64 socket M10-4S server sporting the Fujitsu developed SPARC Venus cpu: http://www.theregister.co.uk/2012/10/01/fujitsu_or... Or the coming 96-socket SPARC server. :)
A 32TB or 96TB server is also not really interesting for companies dealing with "Big Data" and big databases. What happens when your working set grows even more? Shut your company down and wait until Oracle manages to build an even larger database? These monsters are mainly interesting to companies where lack of software development foresight and/or capability had engineered them into a corner where they have to buy themselves out by getting a larger hammer. Smarter organizations pour their R&D into making their software and databases scale out and provide RAS on the cluster level. The monsters, while very sexy, are interesting for a tiny fraction of huge dinosaur corporations, and even those will slowly die out by succumbing to their own weight. The dying out will of course take a long time due to the amount of fat these corporations have managed to accumulate, providing ample lucrative options for companies facilitating their death by providing stupidly expensive solutions to problems better solved by changing how the game is played.
Intel's advantage in CPU design stems from massive consumer usage. The individual Ivy Bridge core used in these 15 core monster is the same fundamental design that was introduced to notebooks/desktops in 2012. Essentially the end consumers get to be the guinea pigs and any errata found within the first six months can be adopted into the server design before it ships. What makes these a server CPU is the focus on IO and RAS features outside of the CPU core (which have their own inherent design costs).
IBM and the other RISC vendors don't have the luxury of a high volume design. Mainframe installations number between 10,000 and 20,000 depending on source. Not very many at either end of that spectrum. IBM's POWER installations are several times larger in terms of units but still dwarfed by just the x86 server unit shipments. On the high end, this has lead to some rather large prices from IBM: http://www-01.ibm.com/common/ssi/ShowDoc.wss?docUR...
The one thing that matters for RAS is just uptime. The easiest way to get there is to cluster basic services so that a single node can be taken offline and addressed while the external interface fails over to another node. This basic principle is true regardless of hardware as you want to run a system in a minimum of a pair, ideally a primary pair with an offsite backup system. The one nice thing is that software licensing here isn't as dreadful as scaling up: often there is a small discount to make it less painful. Virtualization of smaller systems have helped in terms of RAS as being able to migrate running systems around a VM farm. Hypervisors are now supporting shadow images so that there is no additional network traffic for a VM to fail over to another node in case of a hardware failure. The x86 platform in many cases is 'good enough' that 99.999% uptime can be achieved with forward thinking resource planning.
I'm sorry, but it is considered best practice to run databases in pairs for redundancy. For example, here is an Oracle page explaining how clustering is used to maintain high availability: http://docs.oracle.com/cd/B28359_01/server.111/b28...
Other databases like MySQL and MS SQL Server have similar offerings.
There is a reason why big hardware like this is purchased in pair or sets of three.
Kevin G. you are actually correct. We are in the process for comparing performance of Power7+ vs Xeon v2 for SAP batch workload and we got pretty much the same arguments from our AIX guys as Brutalizer mentionned.
We are using real batch jobs rather than an synthetic benchmark and we set up each system to compare core-for-core, down to running a memory defrag on the Power system to make sure memory access is a good as possible. The only thing we could not fix is that in terms of network access, the Intel system was handicapped.
What we are seeing is that we can tune the Intel system to basically get similar performance (< 5% difference of total runtime) than from the Power7+ system (P780). This was quite unexpected but it's an illustration of how far Intel and the hardware vendors building servers/blades based on those CPUs have come.
Looking at the Xeon E7 V2's right now is wise since they're just hitting market and the core infrastructure is expected to last three generations. It wouldn't surprise me if you can take a base system today using memory daughter cards and eventually upgrade it to Broadwell-EX and more DDR4 memory by the end of the product life cycle. This infrastructure is going to be around for awhile.
POWER7+ on the other hand is going to be replaced by the POWER8 later this year. I'd expect it to perform better than the POWER7+ though how much will have to wait for the benchmarks after it is released. There is always going to be something faster/better/cheaper coming down the road in the computing world. Occasionally waiting makes sense due to generational changes like this. Intel and IBM tend to leap frog each other and it is IBM's turn to jump.
Ultimately if you gotta sign the check next week, I'd opt for the Xeon but if you can hold off a few months, I'd see what the POWER8 brings.
Power8 will be interesting to look at, but based on current data it will have to yield a pretty impressive performance boost over Power7+ (and Xeon v2) in order to be competitive on a performance per dollar spent.
IBM is claiming two to three times the throughput over POWER7+. Most of that gain isn't hard to see where it comes from: increasing the core count from 8 to 12. That change alone will put it ahead of the Xeon E7 v2's in terms of raw performance. Minor IPC and clock speed increases are expected too. The increase from 4 way to 8 way SMT will help some workloads, though it could also hurt others (IBM does support dynamic changes in SMT so this is straightforward to tune). The rest will likely come from system level changes like lower memory access times thanks to the L4 cache on the serial-to-parallel memory buffer and more bandwidth all around. What really interests me is that IBM is finally dropping the GX bus they introduced for coherency in the POWER4. What the POWER8 does is encapsulates coherency over a PCIe physical link. It'll be interesting to see how it plays out.
As you may suspect, the cost of this performance may be rather high. We'll have to see when IBM formally launches systems.
I think Brutalizer is saying that, this new Xeon CPU is pretty much for targeted market. Unix since then has been the backbone of the internet, Intel as much as they can they want to cover the general area of server market. Sure it's a nice CPU, but as reliability goes, I would rather use a slower system but reliable in terms of calculations. I would still give intel the thumbs up for trying something new or updating the cpu. As for replacing unix servers for large database enterprise servers, probably not in a long time for intel. I would say to intel to leave on the real experts on this area that just focuses on these market. Intel is just covering their turf for smaller scale server market.
The x86 servers have caught up in RAS features. High end features like hot memory add/remove are available on select systems. (Got a bad DIMM? Replace it while the systems is running.) Processor add/remove on a running system is also possible on newer systems but requires some system level support (though I'm not immediately familiar with a system offering it.) In most cases with the base line RAS features, Xeons are more than good enough for the job. Hardware lockstep is also an option on select systems.
Uses for ultra high end features like two bit error correction for memory, RAID5-like parity across memory channels, and hot processor add/remove are a very narrow niche. Miscellaneous features like instruction replay don't actually add much in terms of RAS (replay on Itanium is used mainly to fill up unused instruction slots in its VLIW architecture, where as lock step would catch a similar error in all cases). Really, the main reason to go with Unix is on the software side, not the hardware side anymore.
Brutalizer writes; "Some examples of Scale-out servers (clusters) are all servers on the Top-500 supercomputer list. Other examples are SGI Altix / UV2000 servers or the ScaleMP server, they have 10,000s of cores and 64 TB RAM or more, i.e. cluster. Sure, they run a single unified Linux kernel image - but they are still clusters. ..."
Re the UV, that's not true at all. The UV is a shared memory system with a hardware MPI implentation. It can scale codes well beyond just a few dozen sockets. Indeed, some key work going on atm is how to scale relevant codes beyond 512 CPUs, not just 32 or 64. The Cosmos installation is one such example. Calling a UV a cluster is just plain wrong. Its shared memory architecture means it can handle very large datasets (hundreds of GB) and extremely demanding I/O workloads; no conventional 'cluster' can do that.
Have you read about the ScaleMP Linux server (it has 8192 cores or even more) in my link above? It also has a shared memory system, running a single Linux kernel image. They solve the scalability problem by using a software hypervisor that tricks Linux into believing it is running on a SMP server, and not a cluster. If you read the post in that link, a programmer writes:
"...I tried running a nicely parallel shared memory workload (75% efficiency on 24 cores in a 4 socket opteron box) on a 64 core ScaleMP box with 8 2-socket boards linked by infiniband. Result: horrible. It might look like a shared memory, but access to off-board bits has huge latency...."
Thus, the huge ScaleMP Linux server is only good for workloads where each node runs independent code, with little interaction to other nodes - that is the hallmark of HPC number crunching stuff running on clusters.
@mapesdhs: "...Calling a UV a cluster is just plain wrong..." Regarding if the SGI UV server is a cluster or not: there is a very litmus test to find out if it is really a cluster, or not. Is the SGI UV server used for HPC workloads or SMP workloads? SGI themselves says it is only for HPC workloads. As does ScaleMP.
If you want to prove I am wrong, and your claim is correct: show us links to customers running large SMP workloads on the SGI UV cluster. Hint: you will not find a counter example. Why?
1) SGI says they are not going to try SMP workloads (which is odd, as there is the really big money)
2) It uses MPI, which is a library used for HPC number crunching. I myself has programmed MPI for scientific computations, and I tell you, that you can not rewrite Oracle Database or DB2 or MySQL using MPI without great effort. MPI is for sending code to nodes for execution. A large SMP server does not need MPI libraries or something, it is just programmed as a usual server.
So, instead of you telling me I am wrong, I suggest you just show us links with SMP workloads for the SGI UV2000 server - which is the easiest thing to settle this question. I have showed links where SGI says their big Altix server is not for SMP workloads, it is only for HPC - which means: cluster. If you can show that many customers are replacing large Unix 32 socket servers with SGI UV2000 servers - then you are right, and I am wrong. And I will shut up.
Have you not thought about why the old mature Unix servers are still stuck at 32 or 64 sockets, whereas Linux servers exists in configurations 1-8 sockets or 100s of sockets - but nothing in between? Answer: the 1-8 socket Linux servers are just ordinary x86 servers, and they are great for SMP workloads such as SAP or ERP or whatever. The 100s socket Linux servers are all clusters. There are no 32 socket Linux SMP servers for sale - and has never been. Linux scales bad on SMP workloads, the maximum is 8-sockets. If you check 8-socket Linux benchmarks, the cpu utilization is quite bad. For isntance SAP benchmarks shows Linux having ~88% cpu utilization whereas Solaris has 99% cpu utilization. Solaris scales much better on as few as 8-socket x86 servers, where Linux has problems. That is the reason Solaris has higher perfomance on SAP benchmarks, although the Linux server used faster CPUs, and faster RAM dimms.
Why does the 32 socket Unix servers cost much more than the largest SGI server configuration? Answer: because SGI is a cluster, consisting of X cheap nodes.
"... And here's the punch line, Solaris has never even run on a 1024 cpu system let alone one as big this new SGI system, and Linux has handled it just fine for years. Yet, ZFS creator Jeff Bonwick feels compelled to imply that Linux doesn't scale and Solaris does. To claim that Solaris is more ready to scale on large multi-core systems is pure FUD, and I'm saddened to see someone as technically gifted as Jeff stoop to this level....Now, this all would be amusing if this were the early 90's and us Linux folk were "just a bunch of silly hobbyists." Yet these Solaris guys behave as if we're still in that era."
He clearly has no clue of SGI being a cluster running HPC workloads, whereas Solaris runs SMP workloads on 32/64 socket servers. In fact, decades ago, there was a 144 socket Solaris server. In 2015, Oracle will release a 16.384 thread server with 64TB RAM. The point is: SPARC is doubling performance every generation, whereas Intel is not. SPARC T4 were the worlds fastest cpu in Enterprise database workloads two years ago, and last years SPARC T5 servers are four times as fast as T4. This year, SPARC T6 will arrive, which will be twice as fast again.
There is no chance in hell Intel will match Unix servers on large workloads. 8-socket Intel x86 servers can never compete with 32 or 64 socket Unix servers.
Very interesting comments, but I would like to ask you, what about Itanium (9500 series) ?
I think that Intel keeps 8+ sockets for Itanium series which are capable of up to 32-socket systems.
I can't really answer if there wasn't Itanium, if Intel could build a 32-socket x86-64 system.
BTW, I can't find Enterprise benchmarks for top 9500 Itanium series, like 9560.
Which is the performance of a 32-socket Itanium 9560 system (for example Superdome 2 or other) compared to Oracle SPARC M5/M6-32 or an IBM equivalent ?
Also would be interesting a direct comparison of an 8-socket Itanium 9560 system with an 8-socket Xeon E7 v2 system, to see the internal competition of the two platforms.
Itanium has very bad performance, as it is not actively developed anymore. Even back then, Itanium had bad performance. Itanium had better RAS than performance. HP provided the RAS capabitlites from their HP-UX servers (PA-RISC cpus). And now Intel has learned some RAS from HP, and Intel is trying to tuck on RAS onto x86 instead, and killing off Itanium. Intel learned RAS. But the x86 RAS is not good enough yet. They can not replay faulty instructions, can not compare output of several cpus and shut faulty cpus down, etc.
But Itanium exists in the 64 socket HP Integrity (or is it Superdome) servers, running HP-UX. This is the Big Tux server, I wrote about.
Intel wants desperately go to the high end 16/32 socket servers, where the big money is. But x86 lacks scalability, and has been stuck at 8-sockets forever. Also, the operating system needs to be mature and optimized for 16 or 32 sockets - Linux is not. Because there are no such large Linux servers, it can not be optimized for 16 or 32 socket SMP servers. Only recently, people have been trying to compile Linux to the old mature Unix servers, with bad results. It takes decades to scale well. Scalability is very difficult. Otherwise Intel would be selling 16 or 32 socket x86 servers raking in the BIG money.
But, this x86 E7 cpu is nice, definitely. But to say it will compete in the high end is ridiculous. You need 32 or 64 sockets for that, and at least 32TB RAM. And you need extreme RAS.
You seem to describe a complete dead end for Intel.
Because if x86 is inherently limited to 8-socket scalability and has no luck with extreme RAS, then Intel's decision was right to invest to another ISA for high end servers, like EPIC.
But if Itanium is a low performance CPU, even with high RAS, then Intel is doomed.
I can't see how could penetrate into the top high end systems where the big money is.
Intel has announced that several major RAS features from Itanium will make its way to x86 systems. The main thing is integrated lock step and processor hot swap. These two features can be found on specific x86 servers that provide the additional logic for these.
Similarly, x86 can scale beyond 8 sockets with additional glue logic. This is the same for Itanium and SPARC.
"...Similarly, x86 can scale beyond 8 sockets with additional glue logic. This is the same for Itanium and SPARC...."
Yes, there is in fact a 16-socket x86 server released last year by Bullion. It has quite bad performacne, and the cpu utilization is not that good I guess because of the bad perofrmance. If Intel is going to scale beyond 8-sockets, they need to do it well, or nobody is going to use them for SMP work, when they can buy a 8 or 16 socket SPARC or POWER server, with headroom for growth to 32 or 64 sockets.
Except you are intentionally ignoring the glue logic that SGI has developed for the UV2000.
With Intel phasing out Itanium, their x86 Xeon line is picking up where Itanium left off. It does help that both the recent Itaniums and Xeons are using QPI interconnects as the glue logic developed for one can be used for the other architecture. (I haven't seen this confirmed but I'm pretty sure that the glue logic for the SGI UV1000 was originally intended to be used with Itaniums.)
x86 is not inherently limited to 8-socket scalability. Intel needs to develop techniques to scale beyond 8 sockets - which is very difficult to do. Even if Intel scales above 8-sockets, Intel needs to scale well and utilize all resources - which is difficult. So, with time, we might see 16-socket x86 servers. And in another decade maybe 24 socket x86 servers. But the mature Unix OSes has taken decades to go into 32 socket arena, and they started big, always had big servers as the target. Intel starts from desktop, and trying to venture into larger servers. Intel may succeed given time. But not today.
Are you sure 16+ sockets is the only route to SMP workloads? It seems to me Intel is choosing the direction of not increasing socket count but rather core count, what does that sound like in term of feasibility? What can be said about 72 core 3TFlop Xeon Phi behind a large ERP?
Xeon Phi is utterly crap at database/ERP workloads. Extracting memory level parallelism is required to get good per-thread performance in database workloads, and you need a pretty good OoO core to extract it. Xeon Phi is designed for regularly structured computation kernels and will be multiple times slower than a big core. You cannot compensate that with more cores either because the workloads do still have scalability limits where you hit into nasty contention issues. To get even to the level of scalability they currently have has taken blood, sweat and tears on all levels of the stack, from the kernel, through the database engine to the application code using the database.
The reason I cite Big Tux, is because that is the only benchmarks I have seen for Linux running on 64 sockets. If you have other benchmarks, please link to them so I can stop refer to Big Tux.
I have never attributed Linux bad performance on Big Tux, because the Itanium has poor performance. I attribute Linux bad performance on Big Tux, because of this: Linux had ~40% cpu utilization on 64 socket Big Tux Itanium server. This means every other cpu idles under full load when using Linux. Is this bad or not? This has nothing to do with Itanium. If Linux ran 64 socket SPARC or POWER - it would still idle ~40%.
Thus, my conclusion of Linux bad performance, is because of the low cpu utilization. It has nothing to do with how fast or slow the hardware. Instead, how good does Linux utilize all resources on large servers? Answer: very bad.
Talking about slandering Linux, have you read this from a prominent Linux kernel developer? http://vger.kernel.org/~davem/cgi-bin/blog.cgi/200... "...And here's the punch line, Solaris has never even run on a 1024 cpu system let alone one as big this new SGI system, and Linux has handled it just fine for years. Yet Mr. Bonwick feels compelled to imply that Linux doesn't scale and Solaris does. To claim that Solaris is more ready to scale on large multi-core systems is pure FUD, and I'm saddened to see someone as technically gifted as Jeff stoop to this level..."
Who is slandering who? Is it FUD to say that Linux has scalability problems over 8 sockets? Is it FUD to say that there has never been a 32 socket Linux server for sale? Or is it just that he is not aware of different types of scalability: clusters or SMP servers? Is it just pure ignorance, when he believes a 4096 core Linux cluster can replace a 32 socket SMP server? What do you think? Is it FUD when the ZFS creator claims that Linux does not scale on 32 socket servers, or is it in fact a true claim? Who is FUDing who?
Linux scales just as well as Unix on large socket counts. Case in point are IBM's own benchmarks on their p795 systems with 32 sockets, 256 cores and 1024 threads: AIX only beats Linux by a mere 2.7% Source: http://www-03.ibm.com/systems/power/hardware/795/p...
I should also point out that your link is 7 years old. Things have changed in the Linux kernel.
Well you're right, but it's not as bad for x86 as you make it sound. Systems like TITAN were examples of scale-out compute, if ever there was one. I'll grant it's not the same in terms of what they calculate (Titan is simulation focused and GPU focused) and less on pure RAS and rapid DB access like ERP (not transactional / real time). But that's essentially irrelevant. The point is how they scale in terms of number of nodes and the cost of nodes.
Intel's newest chip is cool, but not practical in terms of price competition (why Titan used more Opteron nodes instead of Xeon, for example). What you're focused on is price competition at the ultimate upper end of the spectrum, where SPARC and Power live. And that, in turn, the price of the highest end single system. Intel may be trying to break into that space, but no, it doesn't make sense because x86 wasn't designed for it as an architecture. Their single systems won't compete, yet.
But that's not to say this new Xeon irrelevant. It isn't. It will, however, have problems because of the price-per-performance isn't competitive. In a scale-out design you want more, cheaper nodes and beat the competition by volume. These nodes are just too expensive when you want performance per dollar.
What most mid-to-large companies need is a scalable setup that grows with their business. A lot of IT is bean counting and cost cutting. If you want to start SMP, you start small and tack on additional systems, because your budget people won't let you get a SPARC system or Unix setup. Oracle just doesn't offer systems or prices that are reasonable, and because of this, many businesses that SMP won't give them a second glance. This is where x86 and Xeon fit into the picture, scale out, starting small and building up. But these new systems are asking too much and people aren't going to be interested.
Intel has effectively killed off the Itanium. The original 22 nm Kitson has been scrapped and the successor to Poulson is going to be on 32 nm as well. After that, nothing appears on Intel's roadmap for the chip.
HP, the largest Itanium customer, has already announced that their NonStop mainframe line is moving to x86:
"So, instead of you telling me I am wrong, I suggest you just show us links with SMP workloads for the SGI UV2000 server... then you are right, and I am wrong. And I will shut up."
United States Post Office running Oracle Data Warehouse software on a SGI UV1000 (the older sibling of the UV2000, still shared memory and cache coherent): https://www.fbo.gov/index?s=opportunity&mode=f...
But please, Kevin G, dont you know that Hadoop is a clustered solution? Why do you think people are running clustered database solutiosn as Hadoop on a SGI UV2000 server? Is it because SGI says it is for clustered benchmarks only?
There a couple of reasons why someone would have to run Hadoop on a UV2000: the UV2000 has a large global address space which data could directly reside (ie. no disks access necessary!). If the raw data can reside in 64 TB, performance should be very good. Secondly, Hadoop is free under the Apache license. Traditional database software like Oracle charge a premium the more sockets there are installed on a system. I'd imagine that 256 socket UV2000 system would incur an Oracle licensing fee in the tens of millions of US dollars. So between the choice of free or tens of millions of dollars, most organizations would at least try to work with the free solution.
"Thus, the x86 does not scale above 8-sockets." The SGI UV2000 is a fully cache coherent server that scales up to 256 sockets. It uses some additional glue logic but this is no different than what Oracle uses to obtain similar levels of scalibility.
"Some examples of Scale-out servers (clusters) are all servers on the Top-500 supercomputer list. Other examples are SGI Altix / UV2000 servers... Scale-up servers, are one single fat huge server."
"The reason is the Linux kernel devs does not have access to 32 socket SMP server, because they dont exist, so how can Linux kernel be optimized for 32 sockets?"
Oh wow, you're confusing a file system with the kernel. You do realize that Linux has suport for many different file systems? Even then Ext4 is actually shown to scale after a few patches per that link. Also of particular note is that 4 years ago when that article was writen Ext4 was not suited for production purposes. In the years since, this has changed as has its scalability.
"For instance the Big Tux HP server, compiled Linux to 64 socket HP integrity server with catastrophic results, the cpu utilization was ~40%, which means every other cpu idles under full load. Google on Big Tux and read it yourself."
Big Tux was an ancient Itanium server that was constrained by equally ancient FSB architecture. Even with HP-UX, developers are lucky to get high utilization rates due to the quirks of Itanium's EPIC design.
Readers should note that this link is a decade old and obviously SGI technology has changed over the past decade.
"Thus, this Intel Xeon E7 cpu are only used up to 8-sockets servers. For more oomph, you need 32 socket or even 64 sockets - Unix or Mainframes."
Modern x86 and Itanium chips form Intel only scale to 8 sockets without additional glue logic. This is similar to modern SPARC chips from Oracle which need glue logic to scale past 8 sockets. IBM is the only major vendor which does not use glue logic as the GX/GX+/GX++ use a multi-tiered ring topology (one for intra-MCM and one for inter-MCM communication).
"Another reason why this Intel Xeon E7 can not touch the high end server market (beyond scalability limitations) is that the RAS is not good enough."
"Thus: -Intel Xeon E7 does not scale above 8-sockets. Unix does. So you will never challenge the high end market where you need extreme performance. Besides, the largest Unix servers (Oracle) have 32TB RAM. Intel Xeon E7 has only 6TB RAM - which is nothing. So x86 does not scale cpu wise, nor RAM wise."
The new Xeon E7v2's can have up to 1.5 TB of memory per socket and in an 8 socket system that's 12 TB before needing glue logic. The SGI UV2000 scales to 256 sockets and 64 TB of memory. Note that SGI's UV2000's memory capacity is actually limited by the 46 bit physical address space while maintaining full coherency.
"-Intel Xeon E7 has no sufficient RAS, and the servers are unreliable, besides the x86 architecture which is inherently buggy and bad (some sysadmins would not touch a x86 server with a ten feet pole, and only use OpenVMS/Unix or Mainframe): http://www.anandtech.com/show/3593 "
Nice. You totally missed the point of that article. It was more a commentary on yearly ISA increases in the x86 space and differences between AMD and Intel's implementations. This mainly played out with the FMA instructions between AMD and Intel (AMD supported 4 operand FMA in Bulldozer where as Intel supported 3 operand FMA in Sandybridge. AMD's Piledriver core added support for 3 operand FMA.) Additionally, ISA expansion should be relatively rare, not a yearly cadence to foster good software adoption.
ISA expansion has been a part of every platform so by your definition, everything is buggy and bad (and for reference, IBM's z/OS mainframes have even more instructions than x86 does).
"-Oracle is much much much much cheaper than IBM POWER systems. The Oracle SPARC servers pricing is X for each cpu. So if you buy the largest M6-32 server with 32TB of RAM you pay 32 times X. Whereas IBM POWER systems costs more and more the more sockets you buy. If you buy 32 sockets, you pay much much much more than for 8 sockets."
This came out of no where in the conversation. Seriously, in the above post, where did you mention pricing for POWER or SPARC systems? Your fanboyism is showing. I think you cut/paste this from the wrong script.
Regarding my link about Ted Tso, talking about filesystems. You missed my point. He says explicitly, that Linux kernel developers did not have access to large 48 core systems. 48 cores, translates to... 8-sockets. I tried to explain this in my post, but apparently failed. My point is, if prominent Linux kernel developers think 8-socket servers are "exotic hardware" - how do well do you think that linux scales on 8-sockets? No Linux developer has such a big server with 8-sockets to optimize Linux to. Let alone 16 or 32 sockets. I would be very surprised if Linux scaled well beyond 8-sockets without even optimizing for larger servers.
Then you talk about how large the SGI UV2000 servers are, etc etc. And my link where SGI explains that their predecessor Altix server is only suitable for HPC workloads - is rejected by you. And the recent ScaleMP link I showed, where they say it is only used for HPC workloads - is also rejected by you I believe - on what grounds I dont know. Maybe because it is 2 years old? Or the font on the web page is different? I dont know, but you will surely find something to reject the link on.
Maybe you do accept that the SGI Altix server is a cluster fit for HPC workloads, as explained by SGI? But you do not accept that the UV2000 is a successor to Altix - but instead the UV2000 server is a full blown SMP server somehow. When the huge companies IBM and Oracle and HP are stuck at 32 sockets, suddenly, SGI has no problems scaling to 1000s of cpus for a very cheap price. You dont agree something is a bit weird in your logical reasoning?
Unix: lot of research during decades from the largest companies: IBM, Oracle and HP - are stuck at 32 sockets, after decades of research. Extremely expensive servers, one single 32 socket server at $35 million. Linux: No problemo sailing past 32 sockets, hey, we talk about 100.000s of cores. Good work by the small SGI company (the largest UV2000 server has 262.144 cores). And also, same work by the startup ScaleMP - also selling 1000s of sockets. For a cheap price. But hey, why being modest and stop at quarter million of cores? Why not quarter million sockets? Or a couple of millions?
There is no problem here? What the three largest companies can not do, under decades, SGI and ScaleMP and other Linux startups has no problem with? Quarter of million of cores? Are you sh-tting me? Do you really believe it is a SMP server, used for SMP workloads, even though both SGI and ScaleMP says their servers are for HPC clustering workloads?
And how do you explain the heavy use of HPC libraries such as MPI in the UV2000 clusters? You will never find MPI in an enterprise business system. They are only used for scientific computataions. And SMP server does not use MPI at all, didnt you know? http://www.google.se/url?sa=t&rct=j&q=&...
Very simple: MPI is a technique to ensure data locality for processing regardless it is if a cluster or a multi-socket system. It reduces the number of hops data has to traverse regardless if it is a SMP link between sockets or a network interface between independent systems. Fewer hops means greater efficiency and greater efficiency equates to greater throughput.
Also if would have actually read that link you'd have realized that the UV2000 is not a cluster. It is a fully coherent system with up to 64 TB of globally addressable memory.
"Regarding my link about Ted Tso, talking about filesystems. You missed my point. He says explicitly, that Linux kernel developers did not have access to large 48 core systems. "
A lot of Linux developers are small businesses or individuals as is the beauty of open source software - everyone can contribute. It also means that not everyone will have equal access to resources. There are some large companies that invest heavily into Linux like IBM. They have managed to tune Linux to get to 2.7% the performance of AIX on their 32 socket, 256 core, 1024 thread p795 system in SPECjbb2005. Considering the small 2.7% difference, I'd argue that Linux scales rather well compared to AIX.
"Then you talk about how large the SGI UV2000 servers are, etc etc. And my link where SGI explains that their predecessor Altix server is only suitable for HPC workloads - is rejected by you."
Yes and rightfully so because you're a decade old link to their predecessor that has a different architecture.
"But you do not accept that the UV2000 is a successor to Altix - but instead the UV2000 server is a full blown SMP server somehow. When the huge companies IBM and Oracle and HP are stuck at 32 sockets, suddenly, SGI has no problems scaling to 1000s of cpus for a very cheap price. You dont agree something is a bit weird in your logical reasoning?"
Not at all. SGI developed the custom glue logic, NUMALink6, to share memory and pass coherency throughout 256 sockets. Oracle developed the same type of glue logic for SPARC that SGI developed for x86. Only thing noteworthy here is that SGI got this type of technology to market first in their 256 socket system before Oracle could ship it in their 96 socket systems. The source for this actually comes from a link that you kindly provided: http://www.theregister.co.uk/2013/08/28/oracle_spa...
And for the record, IBM has a similar interconnect as well for the POWER7. The thing about the IBM interconnect is that it is not cache coherent across the glue logic, though the 32 dies on one side of the glue are fully cache coherent. The main reason for loosing coherency in this topology is the physical address space of the POWER7 can exceeded at which point coherency would simply fail anyway. All the memory in these systems is addressable through the virtual memory though. Total number of dies is 16384, 131,072 cores, and 524,288 threads. Oh, and this system can run either AIX or Linux when maxed out. Source: http://www.theregister.co.uk/Print/2009/11/27/ibm_...
So really, all the big players have this technology. The differences are just how many sockets a system can have before this additional glue logic is necessary, how far coherency goes and the performance impact of the additional traffic hops the glue logic adds.
"There is no problem here? What the three largest companies can not do, under decades, SGI and ScaleMP and other Linux startups has no problem with? Quarter of million of cores? Are you sh-tting me? Do you really believe it is a SMP server, used for SMP workloads, even though both SGI and ScaleMP says their servers are for HPC clustering workloads?"
The SGI UV2000 fits all the requirements for a big SMP box: cache coherent, global address space and a single OS/hypervisor for the whole system. And as I mentioned earlier, both IBM and Oracle also have their own glue logic to scale to large number of cores.
As for the whole 'under decades' claim, scaling to large numbers of cores hasn't been possible until relatively recently. The integration of memory controllers and point-to-point coherency links has vastly simplified the topology for scaling to a large number of sockets. To scale efficiently with a legacy FSB architecture, the north bridge chip with the memory controller would need to have a FSB connection to each socket. Want 16 sockets? The system would need 16 FSB stemming off of that single chip. Oh and for 16 sockets the memory bandwidth would have to increase as well, figure one DDRx channel per FSB. That'd be 16 FSB links and 16 memory channels coming off of a single chip. That is not practical by any means. IBM in some of their PowerPC/POWER systems used a ring topology before memory controllers were integrated. Scaling there was straightforward: just had more hops on the ring but performance would suffer due to the latency penalty for making each additional hop.
As for what the future holds, both Intel and IBM have been interested in silicon photonics. By directly integrating fiber connections into chip dies, high end Xeons and POWER chips respectively will scale to even further heights than they do today. By ditching copper, longer distances between sockets can be obtained with a signal repeater, a limiting factor today.
Yes you are insightful, . learned, and express yourself with linearity, " your a teacher " thanks, but where are you other thoughts ? Cheers from Thomas in Vancouver Canada
The E7 v2 family of processors should give Intel a seat at the scale-up table, with architectural support for 15 cores/socket, 32 socket systems and 1.5 TB RAM per socket. IE: A single system with 480 fat cores and 48TB RAM.
Sure, they aren't going to take the top of the scale-up charts with this generation, but they should have another belly-busting course of eating into the remaining Sparc, Power and (yes) Itanium niches. (It's only a matter of time until scale-up will be owned by Intel, with all other architectures being in decline.. IE: Oracle, and IBM will only be able to justify so much development into a lagging platform.)
Personally, I am curious if in 15-20 years we'll be talking about ARM64 servers taking on/out the legacy x86 scale-up servers.
Intel based servers can scale over 8 CPUs. While you seem very biased toward "big iron", it should be noted that each vendor have some proprietary solution to connect multiple sockets. And Intel is offering non-proprietary way to connect up to 8 sockets. Above that you can use same approach as "big iron" Oracle/IBM solutions and offer proprietary interconnect of groups of 8xIntel CPU. Even IBM used to do that - I was working with Intel based servers with much more CPU sockets that maximal 4 sockets supported back then by Intel. Those servers used proprietary IBM interconnect between boxes each containing 4 sockets (I think each CPU had 4 cores then), 32GB RAM and I/O.
While using two such boxes instead of one will not result in linear performance improvement (box interconnect is slower than link between inner 8 sockets), such servers use OS that support NUMA architecture (Non uniform memory access) to reduce between-box communications. In addition, many enterprize applications are optimized for such NUMA scenarios and scale almost linearly. We used Windows as OS (support NUMA) and MS SQL as enterprise app (support NUMA), and scalability was excellent even above native Intel 4/8 sockets.
And nowdays such Intel based servers are even better, with 8 CPUs (=120 cores) and 6TB RAM PER BOX, multiply with number of boxes you use.
End result: even without linear scaling , multi-box Intel servers can outperform IBM/Oracle servers while costing less. Your "only UNIX can scale up" comment is clearly wrong - what really keep UNIX/IBM/Oracle in enterprise is not scale-up ability, it is software that was historically made for those OS. Not to mention that enterprises are VERY conservative ("can you show us 10 companies bigger than us, in our region, that use that Windows/Intel for main servers? No? Then we will stay at UNIX or IBM - noone was ever fired for choosing iBM after all ;p" - but even that is slowly changing , probably because they can see those "10 companies")
On the plus side, for Linux development, older 32 and 64 socket mainframes can now be had fairly cheap relative to their "new" pricing. This will aid the ongoing scaling development of Linux. You can grab a Superdome for under 10k, but you will still have to fill the server and additional cabinets with cells, processors and memory. But all in all, they are getting much easier to afford in the broker market.
Need to ask yourself: Why is it that IBM hasn’t published any benchmarks in 3+ years except for certain corner cases? When IBM released Power7, they released benchmarks across every benchmark out there from TPC-C, TPC-H, SPECjbb2005, SPEC_OMP, SPEC CPU2006, SAP, etc. When Power7+ came out, there were only non I/O based benchmarks released. No DB benchmarks, no STREAM benchmark,etc. So maybe Oracle had no choice but to compare against 3-year old results? And why hasn’t IBM published newer results? Maybe because Power7+ is less than a 10% improvement? That’s what IBM's own rPerf metric tells us.
With POWER8 due out later this year, I suspect they'll be updating their old benchmarks with the newer hardware.
The real question is why hasn't IBM ever submitted benchmarks for their z-series mainframes? Performance data there is very lacking. Though z-series costumers tend to fall into two groups: legacy mainframe applications and those who desire ultimate RAS regardless of the performance.
Yes, we shall see what Power8 delivers and when.. Its already a year late according to IBM's "3-year cadence". Power7 is 4 years old this month! As for Mainframe, its not about performance, it’s about uptime but at some point, you can get uptime through clustering and redundancy and then performance becomes the issue. We once did a POC comparing performance of latest Mainframe vs SPARC M6 and we estimated SPARC M6-32 to be 2-3x higher MIPs! as you can imagine, customer is migrating.
Everyone has been suffering delays with chips it seems. Intel even with their process advantage looks to be a 9 month to a year beyond schedule for their 14 nm roll out. IBM/TSMC/GF/Samsung are similarly behind in their roll out of 22/20 nm class logic.
There has been a desire for ages to get off of mainframes in some industries. Reliability is 'good enough' and performance is better but the reason some don't migrate is simply software costs. I used to work in such a shop and the mainframe hanged around due to the extensive cost of porting and validating all the legacy software. Also 'if it ain't broke, don't fix it' was a theme at that place and well, the mainframe was never broken. I figure that many main frame shops fall into that category.
A decked out M6-32 out running a mainframe in some tests by 2x within reason for some CPU tests. I'm more curious as to what specific workloads they were. In IO bound tests, the mainframe is still competitive due to raw amount of coprocessors and dedicated hardware thrown into the niche. Flash in the enterprise have helped narrowed the IO gap significantly but I don't think it has managed to surpass the ancient mainframe architecture.
Probably because most of their numbers have held up by and large to the competition. Unlike Sun SPARC and now Oracle SPARC which had disappeared from the benchmark scene for years with T1-T3 and most Fujitsu based servers. Oracle had cherry picked obscure benchmarks with T4 and now with T5 they have had a lot to make up. So, although you make it sound impressive let's not forget the past and the gap that needed to be filled.
I'm a 15 yr Sun veteran now at Oracle so yes, I agree that in past, with older generation SPARC, especially the first generation T-Series, Sun only benchmarked where the T-Series did well and avoided benchmarks where it didn't as it was designed for web tier workloads. That was 5 generations ago! But that’s my point. A vendor isn't going to publish a poor or worse looking result that previous version so every vendor "cherry picks" as you say, Not having a benchmark tells me that either the previous version is better, new version isn't that much better or its worse (whether in throughput, per/core, etc). In any case, the more benchmarks, the better sign that its leading.. And while SPARC T4 was really the first Oracle SPARC developed processor, it caught up to competing CPUs, and with SPARC T5 and even SPARC M6, its hard to argue that SPARC T5 is not leading. With 16 x cores, 8 x threads/core @ 3.6GHz, and glue less scalability to 8-sockets, and SPARC M6 @ 12-cores, 8 x threads/core up to 32-sockets and now almost a year old, Intels latest Xeon Ivybridge-EX has finally caught up, but in certain areas, like DB and middleware performance, still lacking in benchmark proof points to show its superior. And as for Power8, well, we'll just have to wait and see what the systems will deliver and when. Clearly they are aiming at SPARC for high end, now that Itanium is all but dead, and on entry-mid range, competing against Xeon.
Great for intel that they have finally marginally overtaken a several year old IBM box in the sap sd benchmark. Only trouble is the 2.5x faster POWER8 (compared to POWER7) is coming in the next few months.
IBM, like Intel, bins chips by power consumption. It looks like there are indeed 250W POWER7's but they do scale down to 150W.
800W MCM for super computing, 200W POWER7 die @ 3.83 Ghz http://www.theregister.co.uk/Print/2009/11/27/ibm_... The final shipping speed was 3.83 Ghz which falls into the 3.5 to 4.0 Ghz range target in the article.
250W for high end boxes & 150W for blade systems: http://www.realworldtech.com/forum/?threadid=12393... Note that this was an early IBM paper and that 300W per socket figure could have been provisioning for future dual die POWER7+ modules
I'm trying to find the source to the 180W POWER7+ figure. The difficulty is that it appeared in a discussion about Intel's Poulson Itanium which consumes 10W less.
TDP is great for comparing chip to chip, but what really matters is system performance/watt. And although Intel's latest Xeon E7 v2 may have better TDP specs than either Power7+ or SPARC T5, when you look at the total system performance/watt, SPARC T5 actually leads today due to its higher throughput, core count, 4 x more threads, built-in encryption engines and higher optimization with the Oracle SW stack.
Assuming you mean 8 identical cores, until mainstream consumer apps appear that can use more CPU resources than the 4HT cores in Intel's high end consumer chips but which can't benefit from GPU acceleration become common it's not going to happen.
I suppose Intel could do a big.little type implementation with either core and atom or atom and the super low power 486ish architecture they announced a few months ago in the future. But in addition to thinking it was worthwhile for the power savings, they'd also need to license/work around arm's patents. I suppose a mobile version might happen someday; but don't really see a plausible benefit for laptop/desktop systems that don't need continuous connected standby like phones do.
Intel hasn't announced any distinct plans to go this route, they're at least exploring the idea at some level. The SkyLake and Knights Landing are to support the same ISA extensions and in principle a program could migrate between the two types of cores.
Er. You don't need apps to use more than 4 threads to make use of an 8 core processor. Whatever happened to running several demanding applications at once? Surely I am not the only one who does this... My Sandy-Bridge-E processor being a few years old is starting to show it's age in such instances, I would cry tears of blood for an 8-Core Haswell based processor to replace my current 6-core chip.
Did you know: Haswell-E is supposed to be released in Q3 this year, to have up to 8 Haswell cores with HT, fit in the new revision of Socket LGA2011 (incompatible with the current desktop LGA2011), and work with DDR4 and X99 chipset. No GPU there, since it's a byproduct of server Haswell-EP.
I think, 6 cores on desktop for $300 will NOT happen this year. Because if it will, then you'll get $300 4 core i7 on mainstream 1150 & $300 6 core i7 on new 2011 simultaneously on the market. To adjust this, they'll have to sell 1150 4 core i7 for $200-$220, like Core i5 now. This is not realistic, because that's Intel we're talking about, right?...
That's actually the plan, except it won't be $300. I think the latest leaks suggest that the lowest end Haswell-E SKU will be a 6-core K series at ~$400. The other two price points remain about the same, $600 and $1000 for the 8-core SKU's.
The thing is LGA2011 mobos are really expensive, so the CPU price does not have to be that high. You can get a good B85 mobo even for less than 100 $, and an LGA2011 mobos start at 250 or even 300 $. I would not pay 300 $ for a mobo, and 400 $ for a 6-core CPU, that would still be ridiculous. I hate this stagnation. The transision from 1-core to 4-core happend really quickly.
The smallest 6-core K model has been around 500$ for quite some time, so I see no problem going to 400$ this time. 8 cores for 600$ would indeed be a significant step for some, though.
Well, if Intel manages to castrate the HEDT "E" version enough so that it does not pose any threat to their Xeon revenue, price drop might happen.
However, one factor not to be underestimated is total available market and how much are target consumers for this kind of hardware willing to pay. I have no data, but for some reason I think only small % of "power users" (>very< power users) need 8 cores today and they would probably be willing to shell out $1000.
Thing is, if you are Intel, you will probably making the calculation: what if we drop the price to, say, $600? Is this going to bring us more customers? Is this going to cannibalize some of, more lucrative, Xeon market?
I suppose if Intel fuses out TSX, VT-D, ECC memory support and, of course, QPI (which is what they do anyway with Sandy-E and Ivy-E HEDT CPUs) the chip would practically be next to useless to most Xeon customers. So the remaining issue is the market.
i was hoping for 8 core ivy bridge-e chips but had to settle for 6 cores which i can easily use all of
i do a LOT of video encoding using handbrake and that program just loves cores, i easily saturate all 12 threads with my settings in handbrake so i do believe it could use a single socket 8 core well (i have read tests that show handbrake not liking dual/quad socket systems for more cores - but does improve when using lots of cores on a single socket)
You have a error on page 8, in your fourth paragraph you have the opteron as 2.4ghz and only with a score of 2481. From your graph it should have been 2.3ghz and 2723?
The article says "The Opteron core is also better than most people think: at 2.4GHz it would deliver about 2481 MIPs." - but, according to the graph, Opteron already delivers 2723 @ 2.3Ghz. So it is puzzling to see that it "would" deliver less MIPS (2481 vs 2723) at higher frequency (2.4 vs 2.3 Ghz) (regardless of any Intel results/frequencies)
For the more typo-sensitive reader (perhaps both technically astute and typo-senstive):
"A question like "Does the SPARC T5 also support both single-threaded and multi-threaded applications?" must sound particularly hilarious to the our technically astute readers."
From the conclusion: "The Xeon E7 v2 chips are slated to remain in data centers for the next several years as the most robust—and most expensive—offerings from Intel."
I don't think it will be really "several" years - maybe 1-2 years later this Ivy Bridge-EX-based E7 v2 will probably be superseded by Haswell-EX-based E7 v3 with Haswell cores with AVX2/FMA, which should make a difference in pro floating point calculations and data processing, and working with DDR4.
The Ivy Bridge-EX -> Haswell-EX transition will mimic the Nehalem-EX -> Westere-EX transition in that the core systems provided by the big OEM will stay the same. The OEM's offer Haswell-EX as a drop in replacement to their existing socket 2011v1 systems. Haswell-EX -> Broadwell-EX will again be using the same socket and follow a similarly quick transition. SkyLake-EX will bring a new socket design (perhaps with some optical interconnects?).
At some point Intel will offer new memory buffer chips to support DDR4. This will likely require a system to swap out all the memory daughter cards but the motherboard from big OEM's shouldn't change. There may also be a period where these large systems can be initially configured with either DDR3 or DDR4 based upon customer requests.
There will indeed be a quick adoption to Haswell-EX not because of AVX2 or DDR4 but rather transactional memory support (TSX). For the large databases and applications these systems are targeted at, TSX should prove to be helpful.
Coming...we had to run lots of test in parallel, so it was not possible to make sure all systems were similar. Also we should test with workloads that require a lot more memory to get an idea.
Note that E7-8857 v2 has 12 cores but no HT, so only has 12 threads as well (see http://ark.intel.com/products/75254/Intel-Xeon-Pro... Thus it is not equivalent to a 3Ghz E7-4860V2, as 4860 has HT for a total of 24 threads
Also, there must be a typo either in the graph or in the text on the "single thread" integer performance test: "Opteron ... at 2.4GHz would deliver about 2481 MIPs", while - according to the graph - it already delivers 2636 @ 2.3Ghz.
Good point. There is little gain from HT in OpenFoam, but it will influence the LZMA benchmarks. So the Openfoam findings are still valid, but not the LZMA. The kernel compile is somewhat in between.
Thanks! I did not mean to imply HT matters "a lot", but it may influence some (and I admit I don't know much about how your benchmarks behave, other than parallel LZMA which I worked a lot with) - so it just does not sound right to outright call it equivalent, and I wish AT only has statements anyone can just trust :)
Very thorough review, which is what I've come to expect from Anandtech! I am interested but not very knowledgeable about the server side of computing, so this definitely filled me in on a lot of the facets of that area. Thanks for the writeup.
By the way, the "Linux Kernel Compile" page is blank, as bji noted.
While the revenue are high, just how many unit are shipped? I have been thinking if Intel would move Mobile First, meaning Atom, Tablet and Laptop Chips all gets the latest node first, which are low power design. While Desktop and Server will be a Architecture and Node behind. Which will align the Desktop and Xeon E3 - E5 Series.
But it seems the volume of Chips dont quite measure out, since the top end volume are far too small? Anyone have any idea on this.
I believe the statement "Still, that tiny amount of RISC servers represents about 50% of the server market revenues." should read "Still, that tiny amount of RISC servers represents about 50% of the high end server market revenues." Stated differently, from a revenue perspective Intel is #1 vendor in the high end segment even though it has less than a 50% market share. Server orders are placed with vendors, not architectures. Intel has fought an uphill battle to access the high end market and it is costly. However, if Intel can amortize its development costs over a larger revenue base than any competitor, it is well positioned to maintain it's share acquisition momentum.
Very nice review, I would like to see more benchmarks between E7 v2 vs RISC processors because I think the real competition is there.
Older Intel and AMD servers are not real competition for IvyBridge-EX.
It would be interesting when POWER8 is out, to give us the new figures of 8 socket benchmarks and if there is any progress on more 8+ sockets for Intel E7 v2 (built by Cray and other vendors)
I think that E7 v2 (I don't know about older vendors) can be placed in up to 32-socket systems - not natively of course.
Older Intel systems are competition, because these kind of servers are not replaced quickly. If a new generation does not deliver substantial gains, some companies will postpone replacement. In fact, very few people that already have a quad intel consider the move to RISC platforms.
But you have a point. But it is almost impossible for us to do an independent review of other vendors. I have never seen an independent review, and the systems are too scarce, so there is little chance that we can ask a friendly company to borrow us one.
I meant, I have never seen an independent review of high-end IBM or SUN systems. We did one back in the T1 days, but the product performed only well in a very small niche.
Contact your Oracle rep and I am sure we'd be glad to loan you a SPARC T5 server, which we have in our loaner pool for analysts and press. Would be nice if you had a more objective view on comparisons.
If you look at Oracles Performance/Benchmark blog, we have comparisons between Xeon, Power and SPARC based on all publicly available benchmarks. As Oracle sells both x86 as well as SPARC, we sometimes have benchmarks available on both platforms to compare.
Intel and their CPU technology continues to impress. Those kind of performance increase numbers must leave their competitors gasping on the mat. Props for the smart new chip. +1
My wife would now the answer to this considering she works for ibm but considering software costs far exceed hardware costs on a life cycle basis does anyone know what the licensing costs are between the different platforms.
She once had me sit down to explain to her how CPU upgrades would effect db2 licenses. The system is more arcane and I'm not sure what the cost of each core is.
For an ERP each chip type has a rated pvu metric from IBM which determines the cost of the license. Are RISC cores priced differently than x86 cores enough to partially make up the hardware costs?
I know Oracle does that (risc core <> x86 core when it comes to licensing), but I must admit, Licensing is extremely boring for a technical motivated person :-).
In total cost of ownership calculations, where both HW and SW as well as maintenance costs are calculated, the majority of the costs (upwards of 90%) are associated with software licensing and maintenance/administration- so although HW costs matter, it’s the performance of the HW that drives the TCO. For Oracle, both Xeon and SPARC have a per core license factor of .5x, meaning 1 x license for every two cores, while Itanium and Power have a 1x multiplier, so therefore Itanium/Power must have a 2x performance/core advantage to have equivalent SW licensing costs. IBM has a PVU scale for SW licensing, which essentially is similar to Oracle but more granular in details. Microsofts latest SQL licensing follows similarly. So clearly, performance/CPU and especially per core matters in driving down licensing costs.
that would have be very good to test this cpu on 3D rendering benchmark. i can imagine the gain of time in a workstation...even the cost will be nearest a renderfarm... but comparing this xeon to other one in that situation should have bring a view point.
I would like to see V-Ray benchmarked. It's fast becoming an industry standard across a number of 3D industries (started in ArchVis, is now moving into animation feature films and FX)
The author is misleading with statements and data not to mention @Brutalizer comes across very knowledgeable but only backs up claims or Oracle server performance with platitudes and boasts.
Starting with the article - comparing various cores regardless if you adjust the frequency is misleading. You need to normalize the values to show what the per core improvement is. To stay with sockets is useless and lazy. Yes, Intel customers buy servers by the socket but to understand what they are really gaining this is a much better metric. To say there is a 20 or 30% gain when there might be 50% more cores tells me the per core performance is actually lower than Westmere. This is important when using software like Oracle that would price a 15 core socket at 7.5 or 8 Oracle licenses. For software licensed by the core, customers should demand the highest performance available otherwise all you do is subsidize Uncle Larry's island. For the Power comparisons in the SAP benchmarks. You compare a 60 core to a 32 core N-1 generation Power7 server. Since Power servers scale almost linearly by frequency, the 8 core @ 4.22 GHz is 54,700. If we extrapolate that to a 4 socket or 32 cores we would be around 200K SAPS. That is quite a bit more than the 60 core Dell. Also, you could deploy a Power server as a standalone server. Nobody would deploy a mission critical workload on a standalone x86 server. Yes, I'm sure somebody will argue with me and say they do and have done it for years. Ok, but by and large we know they are always clustered and used to scale-out. Secondly, you claim how expensive the Power servers are. When was the last time you priced one Mr De Gelas? You can get a Power7+ 7R1, 7R2, or 7R4 that has price parity with a typical x86 price that includes Linux and VMware and comparably equipped. The 710 and 730 servers would be just a bit more but definitely competitive. Factor in the software savings and reduction in the number of servers required and the TCA and TCO will favor Power quickly. I do it all of the time and can back it up with hard data. You can run Power servers up to 90% utilization but rarely run x86 over 30%, maybe 35% tops.
With regard to @Brutalizer - Big claims of big servers, up to 96 TB of RAM. Who needs that? Who needs a server with 100's or 1000's of cores? The Oracle M6-32 has 1000 DIMMs to get 32 TB of memory. Tell us how this influences the MTBF of the server since the number of components is a major factor in the calculation. Next, you scoff at IBM for comparing to older servers. That is because they are talking to customers who are running older servers - consolidate those older servers onto a few or just 1 server that is inherently reliable - nothing more than a IBM mainframe followed by a IBM Power servers. Oracles M6-32 and M5-32 are just cables T5 servers scaled back from 16 to 12 cores. They have little RAS and built for marketing hype and to drive Oracle software licensing revenue. You say the Oracle M processor pricing is X and then try to paint picture that Power servers are more expensive for a 32 socket than a 8 socket - really. A v8 luxury car is more expensive than a 4 cyl econobox. The server price is moot when the real cost is the software you run on it. With Oracle EE + RAC at $70,500 + 22% annual maintenance per core it matters. On Power I only have to license the cores I need. If I need 2 cores for Oracle then I license 2 cores. On x86, the 15 core is 8. (15 x .5 = 7.5 rounds to 8). Oracle M series is also .5 so your 128 cores on SAP S&D to my 64 co Power7 at 1.0 puts us about equal. However, most customers don't run the servers with one workload. You will say your LDOMs is efficient but compared to Power Hypervisor it won't hold a candle to efficiently using the cores and threads - all of them in true multi-thread fashion. With Power8 coming out soon both Intel and Oracle will go back to smelling the fumes of Power servers. To customers out there. It isn't about being Ford or Chevy. This isn't college - don't root for your team even when they are no good. Your business has to not only survive but hopefully thrive. Do that on a platform that controls the largest cost which is software and Full Time Equivalents - that is Power servers.
Well I must say that this article is clearly Intel biased with a lot of misleading and downright wrong statements about Oracle and SPARC. Heres some accurate and substantiated counters:
"Sun/Oracle's server CPUs have been lagging severely in performance" This is wrong, considering that since the SPARC T4 release, and now SPARC T5 and SPARC M6 announcements, Oracle has announced 20+ world record benchmarks across *all* of the public, audited benchmarks from TPC-C, TPC-H @ 1TB, 3TB, 10TB to SPECjEnterprise2010 and SPECjbb2013. Many of them are still valid today, almost a year later.
What I'd like to ask, is where are the 8-socket Xeon E7 v2 benchmarks to compare to SPARC? Theres only one today - SAP. And this doesn’t demonstrate database performance nor java application performance. Theres also no 4-socket or 8-socket benchmarks on TPC-C, TPC-H, SPECjEnterprise2010.
Even with SPECjbb2013, theres just a 4-socket result, and if you compare performance/core, the SPARC T5-2 @ 114,492 max-jOPS (just 32-cores) has a 1.3x performance/core advantage over the NEC Express5800/A040b with 60 x Intel E7-4890 v2 2.8 GHz cores @ 177,753 max-jOPS.
"As usual, the published benchmarks are very vague and are only available for the top models " As of today, there is not a single real world application/database benchmark that shows Xeon having superior throughput, response times or even price/performance comparing systems with same # of CPUs to SPARC T5. You can go here to see all the comparisons with full transparency. https://blogs.oracle.com/BestPerf/
"and the best performing systems come with astronomic price tags ($950,000 for two servers, some networking, and storage... really?)." You do realize you are linking to Oracle Exadata which isn't a server but an Engineered system with many servers, storage and networking all built-in and based on XEON??
Why are you not linking to SPARC T5 server pricing which is here since that’s what you are trying to discredit? Heres the SPARC T5-2 pricing which is very aggressively priced to x86 & IBM Power7+ systems. https://shop.oracle.com/pls/ostore/f?p=dstore:5:90...
Or better yet, look at a public benchmark where full HW and SW pricing is disclosed?
A SPARC T5-4 is 2.4x faster than the 8-socket Xeon E7-4870 based HP DL980 G7 on TPC-H at 10TB. The SPARC T5-4 server HW fully configured costs $268,853, HP DL 980 costs $268,431.
Basically same costs, SPARC T5 is 2.4x faster than Westmere-EX. Wheres Xeon E7 v2 to showcase its 2x faster??
On TPC-C OLTP benchmark, a SPARC T5-8 has a $/perf of .55USD/tpmC, versus fastest Oracle x2-8 of .89 USD/tpmC and IBM x3850 of .59USD/tpmC. SPARC T5-8 is 70% faster per CPU than Westmere-EX based Oracle x2-8. http://www.tpc.org/tpcc/results/tpcc_results.asp?o...
Interesting discussions. Just for clarification, there is an x86 server that goes beyond 8 sockets-bullion (Xeon E7 48xx up to 16sockets with near linear scale). Bull (legacy GE & Honeywell Mainframe) has leveraged technology used in its mainframe & HPC to build bullion...the world's FASTEST x86 server. bull.us/ bullion
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
125 Comments
Back to Article
DanNeely - Friday, February 21, 2014 - link
" In a nutshell, every effort is made to ensure you cannot compare these with the servers of "Big Blue" or the x86 competition."Of course not. If they did that it would interfere with their deceptive marketing campaign with the banner headline "An Oracle Box costing $$stupid is several times faster than an IBM box costing $$3xStupid"; where if you look up model dates you see they're comparing against a several year old IBM box against their latest and greatest. (I've never been bored enough to dig that deeply; but my inner cynic suspects that they're probably larding a bunch of expensive stuff that doesn't do anything for java onto the IBM box to inflate its price even more.)
Brutalizer - Sunday, February 23, 2014 - link
The reason Oracle sometimes compares to an older IBM model, is because IBM has not released newer benchmarks. IBM does the same, for instance, IBM claims that, one z10 Mainframe with 64 sockets can replace 1.500 of the x86 servers. If you dig a bit, it turns out all x86 servers are like 1GHz Pentium3 with 256MB RAM or so - and they all idle. Yes, literally, all x86 servers idle, whereas the Mainframe is 100% loaded. What happens if some x86 servers starts to do some work? The Mainframe will choke. I can emulate a Mainframe on my laptop with open source emulator "TurboHercules", is it ok if I claim that my laptop can replace three IBM Mainframes (if they all idle)?Regarding this Intel Xeon E7 cpu. Sure it is nice, but it has twice the number of cores as the competition. Another thing is that the largest x86 servers have 8-sockets. There are no larger x86 servers than that. The only 32 socket servers are Unix and Mainframes. Some Unix servers even have 64 sockets. Thus, the x86 does not scale above 8-sockets.
For scalability, you must distinguish between scale-out and scale-up. Scale-out is a cluster, just add a new node and you have increased scalability. Clusters are used for HPC number crunching workloads where you run a tight for loop on some independent data (ideally each node fits everything in the cpu cache). Some examples of Scale-out servers (clusters) are all servers on the Top-500 supercomputer list. Other examples are SGI Altix / UV2000 servers or the ScaleMP server, they have 10,000s of cores and 64 TB RAM or more, i.e. cluster. Sure, they run a single unified Linux kernel image - but they are still clusters. If you read a bit, you will see that the SGI servers are using MPI. And MPI are used on clusters for HPC number crunching.
Scale-up servers, are one single fat huge server. They might have 16 or 32 sockets, some even have 64 sockets! They weigh 1000 kg and costs many many millions. For instance the old IBM P595 Unix server for the old TPC-C record, has 32 sockets and costs $35 million (no typo). One single server with 32 cpus, costs $35 million. You will never ever see this prices on clusters. If you buy a SGI server with 100s of sockets, you will essentially pay the same price as buying individual nodes with the same nr of sockets. But scale-up servers, need heavy redesign and innovative scalability tech, and that is the reason a 16 or 32 socket server costs many many many more times than a SGI cluster having 100s of sockets. They are not in the same arena. These scale-up servers are typically used for SMP workloads, not HPC workloads. SMP workloads are typically large databases or Enterprise ERP workloads. This code is heavy branch intensive, so you can not fit into a cpu cache. It branches everywhere, and clusters can not run these Enterprise workloads because the performance would be very bad. If you need to run Enterprise workloads (where the big margin and big money is) you need to go to 32 socket servers. And they are all RISC or Mainframe servers. Examples are IBM P795, Oracle M6-32, Fujitisu M10-4S, HP Superdome/Integrity. They all run AIX, Solaris, HP-UX and they all have up to 32 sockets or 64 sockets. Some attempts have been made to compile Linux to these huge servers, but the results have been bad because Linux has problems scale above 8-sockets. The reason is the Linux kernel devs does not have access to 32 socket SMP server, because they dont exist, so how can Linux kernel be optimized for 32 sockets? Ted Tso, the famous Linux kernel developer writes:
http://thunk.org/tytso/blog/2010/11/01/i-have-the-...
"...Ext4 was always designed for the “common case Linux workloads/hardware”, and for a long time, 48 cores and large RAID arrays were in the category of “exotic, expensive hardware”, and indeed, for much of the ext2/3 development time, most of the ext2/3 developers didn’t even have access to such hardware...."
Ted Tso considers servers with 48 cores in total, to be huge and out of reach for Linux developers. He is not talking about 48 socket servers, but 48 cores which is chicken shit in the mature Enterprise arena.
For instance the Big Tux HP server, compiled Linux to 64 socket HP integrity server with catastrophic results, the cpu utilization was ~40%, which means every other cpu idles under full load. Google on Big Tux and read it yourself.
There is a reason the huge Linux servers such as SGI UV2000 with 1000s of cores are so cheap in comparison to 16 socket or 32 socket Unix servers, and why the Linux servers are exclusively used for HPC number crunching workloads, and never SMP workloads:
SGI servers are only used for HPC clustered workloads, and never for SMP enterprise workloads:
http://www.realworldtech.com/sgi-interview/6/
"Typically, much of the work in HPC scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this SMP market, at this point in time."
Same with the ScaleMP Linux server with 1000s of cores, is never used for SMP workloads:
http://www.theregister.co.uk/2011/09/20/scalemp_su...
"The vSMP hypervisor that glues systems together is not for every workload, but on workloads where there is a lot of message passing between server nodes – financial modeling, supercomputing, data analytics, and similar parallel workloads. Shai Fultheim, the company's founder and chief executive officer, says ScaleMP has over 300 customers now. "We focused on HPC as the low-hanging fruit."
The difficult thing is to scale well above 8 sockets. You can release one single strong cpu, which does not scale. To scale above 8-sockets are very difficult, ask Intel. Thus, this Intel Xeon E7 cpu are only used up to 8-sockets servers. For more oomph, you need 32 socket or even 64 sockets - Unix or Mainframes. SGI Linux servers can not replace these large Unix servers. And that is the reason Linux never will venture into the lucrative Enterprise arena, and never replace large Unix servers. The largest Linux servers capable of Enterprise SMP workloads are 8 sockets. The Linux clusters dont count.
Another reason why this Intel Xeon E7 can not touch the high end server market (beyond scalability limitations) is that the RAS is not good enough. RAS is very very expensive. For isntance, IBM Mainframes and high end SPARC cpus, can replay an instruction if it were an error. x86 can not do this. Some Mainframes have three cpus and compare every computation, and if there is an error, the failing cpu will shut down. This is very very expensive to create this tailor made hardware. It is easy to get good performance, just turn up the GHz up to unstability point. But can you rely on that hardware? No. Enterprise need reliability above else. You must trust your hardware. It is much better to have one slower reliable server, than a super fast cranked up GHz where some computations are false. No downtime! x86 can not do this. The RAS is lacking severly behind and will take decades before Intel can catch up on Unix or Mainframe servers. And at that point - the x86 cpus will be as expensive!
Thus:
-Intel Xeon E7 does not scale above 8-sockets. Unix does. So you will never challenge the high end market where you need extreme performance. Besides, the largest Unix servers (Oracle) have 32TB RAM. Intel Xeon E7 has only 6TB RAM - which is nothing. So x86 does not scale cpu wise, nor RAM wise.
-Intel Xeon E7 has no sufficient RAS, and the servers are unreliable, besides the x86 architecture which is inherently buggy and bad (some sysadmins would not touch a x86 server with a ten feet pole, and only use OpenVMS/Unix or Mainframe):
http://www.anandtech.com/show/3593
-Oracle is much much much much cheaper than IBM POWER systems. The Oracle SPARC servers pricing is X for each cpu. So if you buy the largest M6-32 server with 32TB of RAM you pay 32 times X. Whereas IBM POWER systems costs more and more the more sockets you buy. If you buy 32 sockets, you pay much much much more than for 8 sockets.
Oracle will release a 96-socket SPARC server with up to 96TB RAM. It will be targeted for database work (not surprisingly as Oracle is mainly interested in Databases) and other SMP workloads. Intel x86 will never be able to replace such a huge monster. (Sure, there are clustered databases running on HPC servers, but they can not replace SMP databases). Look at the bottom pic, to see how all sockets are connected to each other in 32 socket configuration. There are only 2-3 hops to reach each node, which is very good. For HPC clusters, the worst case requires many many hops, which makes them unusable for SMP workloads
http://www.theregister.co.uk/2013/08/28/oracle_spa...
TerdFerguson - Sunday, February 23, 2014 - link
Great post, Brutal. Where can I read more of your writing?JohanAnandtech - Sunday, February 23, 2014 - link
32 sockets to run SMP workloads. " typically large databases or Enterprise ERP workloads". Sound like we are solving a problem with hardware instead of being innovative in software."Intel Xeon E7 has only 6TB RAM - which is nothing".
Dangerous comment. 12 TB is possible with an octal Xeon at a fraction of the cost of the unix boxes you talk about. 1 - 12 TB is enough for a massive part of the market, even a large part of the "lucrative" enterprise market.
I agree with you that there are some workloads which are out of the Xeon's league. But it is shrinking...each time a bit more.
"than a super fast cranked up GHz where some computations are false"
That is another bad statement without any proof.
"The RAS is lacking severly behind and will take decades before Intel can catch up on Unix or Mainframe servers. And at that point - the x86 cpus will be as expensive!"
Considering that the vast majority of problems is related to software (drivers inclusive), I doubt very much that even better RAS can make a big difference. A mature software stack is what make these monster servers reliable, the hardware plays a small role.
Secondly, Intel charges just as much as the market is willing to pay. They can spread the core development over much more CPUs than the RISC vendors, so chances are that they will never as expensive as the RISC vendors.
FunBunny2 - Sunday, February 23, 2014 - link
-- Sound like we are solving a problem with hardware instead of being innovative in softwareWell, it depends on what one means by "innovation". The Kiddie Koders have been recreating the likes of IDMS & IMS (early to mid 1960s approaches), all with newer names by identical semantics and storage models. The way to leverage such machines, relational data is the answer. Minimum storage footprint, DRI, and such. Use SSD, and beat the crap out of these neer-do-well RBAR messes.
xakor - Sunday, February 23, 2014 - link
"Innovative software stacks" might imply something modern and better like immutable databases which are at the opposite end of the spectrum vs IMS placing relational databases inbetween. Read up http://engineering.linkedin.com/distributed-system... concrete examples of good paradigms would be Datomic as well as Event Store.xakor - Sunday, February 23, 2014 - link
Don't get me wrong those database benefit from huge servers with loads of RAM just the same.Brutalizer - Sunday, February 23, 2014 - link
6TB or 12TB is not really interesting as we are entering the Large Data age. Oracle has 32TB today, and with compression you can run huge databases from RAM. And the 96-socket server will have 96TB RAM, which will run databases even faster. Databases are everything, they are at the heart of a company, without databases the company will halt. There are examples of companies not having a backup of their database going bankrupt when their database got wiped out because of a crash. The most important part of a company, is the database, the infromation.I am trying to say that it is better to have a slow and 100% reliable server, than a fast overclocked server that is a bit unstable - for Enterprise customers. There are things that must not go down, no crashes allowed.
For large workloads, Oracle SPARC is the widening the gap to all other cpus, because Oracle is doubling performance every generation. Intel does not do that, nor does IBM. Try to benchmark a 8-socket x86 server against the Oracle 32-socket SPARC M5-32 monster server. Or against the Fujitsu 64 socket M10-4S server sporting the Fujitsu developed SPARC Venus cpu:
http://www.theregister.co.uk/2012/10/01/fujitsu_or...
Or the coming 96-socket SPARC server. :)
stepz - Wednesday, February 26, 2014 - link
A 32TB or 96TB server is also not really interesting for companies dealing with "Big Data" and big databases. What happens when your working set grows even more? Shut your company down and wait until Oracle manages to build an even larger database? These monsters are mainly interesting to companies where lack of software development foresight and/or capability had engineered them into a corner where they have to buy themselves out by getting a larger hammer. Smarter organizations pour their R&D into making their software and databases scale out and provide RAS on the cluster level. The monsters, while very sexy, are interesting for a tiny fraction of huge dinosaur corporations, and even those will slowly die out by succumbing to their own weight. The dying out will of course take a long time due to the amount of fat these corporations have managed to accumulate, providing ample lucrative options for companies facilitating their death by providing stupidly expensive solutions to problems better solved by changing how the game is played.Kevin G - Monday, February 24, 2014 - link
Intel's advantage in CPU design stems from massive consumer usage. The individual Ivy Bridge core used in these 15 core monster is the same fundamental design that was introduced to notebooks/desktops in 2012. Essentially the end consumers get to be the guinea pigs and any errata found within the first six months can be adopted into the server design before it ships. What makes these a server CPU is the focus on IO and RAS features outside of the CPU core (which have their own inherent design costs).IBM and the other RISC vendors don't have the luxury of a high volume design. Mainframe installations number between 10,000 and 20,000 depending on source. Not very many at either end of that spectrum. IBM's POWER installations are several times larger in terms of units but still dwarfed by just the x86 server unit shipments. On the high end, this has lead to some rather large prices from IBM:
http://www-01.ibm.com/common/ssi/ShowDoc.wss?docUR...
The one thing that matters for RAS is just uptime. The easiest way to get there is to cluster basic services so that a single node can be taken offline and addressed while the external interface fails over to another node. This basic principle is true regardless of hardware as you want to run a system in a minimum of a pair, ideally a primary pair with an offsite backup system. The one nice thing is that software licensing here isn't as dreadful as scaling up: often there is a small discount to make it less painful. Virtualization of smaller systems have helped in terms of RAS as being able to migrate running systems around a VM farm. Hypervisors are now supporting shadow images so that there is no additional network traffic for a VM to fail over to another node in case of a hardware failure. The x86 platform in many cases is 'good enough' that 99.999% uptime can be achieved with forward thinking resource planning.
Brutalizer - Tuesday, February 25, 2014 - link
Clusters can not replace SMP servers. Clusters can not run SMP workloads.Kevin G - Tuesday, February 25, 2014 - link
I'm sorry, but it is considered best practice to run databases in pairs for redundancy. For example, here is an Oracle page explaining how clustering is used to maintain high availability: http://docs.oracle.com/cd/B28359_01/server.111/b28...Other databases like MySQL and MS SQL Server have similar offerings.
There is a reason why big hardware like this is purchased in pair or sets of three.
EmmR - Friday, March 14, 2014 - link
Kevin G. you are actually correct. We are in the process for comparing performance of Power7+ vs Xeon v2 for SAP batch workload and we got pretty much the same arguments from our AIX guys as Brutalizer mentionned.We are using real batch jobs rather than an synthetic benchmark and we set up each system to compare core-for-core, down to running a memory defrag on the Power system to make sure memory access is a good as possible. The only thing we could not fix is that in terms of network access, the Intel system was handicapped.
What we are seeing is that we can tune the Intel system to basically get similar performance (< 5% difference of total runtime) than from the Power7+ system (P780). This was quite unexpected but it's an illustration of how far Intel and the hardware vendors building servers/blades based on those CPUs have come.
Kevin G - Monday, March 17, 2014 - link
Looking at the Xeon E7 V2's right now is wise since they're just hitting market and the core infrastructure is expected to last three generations. It wouldn't surprise me if you can take a base system today using memory daughter cards and eventually upgrade it to Broadwell-EX and more DDR4 memory by the end of the product life cycle. This infrastructure is going to be around for awhile.POWER7+ on the other hand is going to be replaced by the POWER8 later this year. I'd expect it to perform better than the POWER7+ though how much will have to wait for the benchmarks after it is released. There is always going to be something faster/better/cheaper coming down the road in the computing world. Occasionally waiting makes sense due to generational changes like this. Intel and IBM tend to leap frog each other and it is IBM's turn to jump.
Ultimately if you gotta sign the check next week, I'd opt for the Xeon but if you can hold off a few months, I'd see what the POWER8 brings.
EmmR - Monday, March 17, 2014 - link
Power8 will be interesting to look at, but based on current data it will have to yield a pretty impressive performance boost over Power7+ (and Xeon v2) in order to be competitive on a performance per dollar spent.Kevin G - Monday, March 17, 2014 - link
IBM is claiming two to three times the throughput over POWER7+. Most of that gain isn't hard to see where it comes from: increasing the core count from 8 to 12. That change alone will put it ahead of the Xeon E7 v2's in terms of raw performance. Minor IPC and clock speed increases are expected too. The increase from 4 way to 8 way SMT will help some workloads, though it could also hurt others (IBM does support dynamic changes in SMT so this is straightforward to tune). The rest will likely come from system level changes like lower memory access times thanks to the L4 cache on the serial-to-parallel memory buffer and more bandwidth all around. What really interests me is that IBM is finally dropping the GX bus they introduced for coherency in the POWER4. What the POWER8 does is encapsulates coherency over a PCIe physical link. It'll be interesting to see how it plays out.As you may suspect, the cost of this performance may be rather high. We'll have to see when IBM formally launches systems.
amilayajr - Thursday, March 6, 2014 - link
I think Brutalizer is saying that, this new Xeon CPU is pretty much for targeted market. Unix since then has been the backbone of the internet, Intel as much as they can they want to cover the general area of server market. Sure it's a nice CPU, but as reliability goes, I would rather use a slower system but reliable in terms of calculations. I would still give intel the thumbs up for trying something new or updating the cpu. As for replacing unix servers for large database enterprise servers, probably not in a long time for intel. I would say to intel to leave on the real experts on this area that just focuses on these market. Intel is just covering their turf for smaller scale server market.Kevin G - Thursday, March 6, 2014 - link
The x86 servers have caught up in RAS features. High end features like hot memory add/remove are available on select systems. (Got a bad DIMM? Replace it while the systems is running.) Processor add/remove on a running system is also possible on newer systems but requires some system level support (though I'm not immediately familiar with a system offering it.) In most cases with the base line RAS features, Xeons are more than good enough for the job. Hardware lockstep is also an option on select systems.Uses for ultra high end features like two bit error correction for memory, RAID5-like parity across memory channels, and hot processor add/remove are a very narrow niche. Miscellaneous features like instruction replay don't actually add much in terms of RAS (replay on Itanium is used mainly to fill up unused instruction slots in its VLIW architecture, where as lock step would catch a similar error in all cases). Really, the main reason to go with Unix is on the software side, not the hardware side anymore.
djscrew - Wednesday, March 12, 2014 - link
"Sound like we are solving a problem with hardware instead of being innovative in software."that doesn't happen... ever... http://www.anandtech.com/show/7793/imaginations-po... ;)
mapesdhs - Sunday, February 23, 2014 - link
Brutalizer writes;
"Some examples of Scale-out servers (clusters) are all servers on the Top-500 supercomputer list. Other examples are SGI Altix / UV2000 servers or the ScaleMP server, they have 10,000s of cores and 64 TB RAM or more, i.e. cluster. Sure, they run a single unified Linux kernel image - but they are still clusters. ..."
Re the UV, that's not true at all. The UV is a shared memory system with a hardware MPI
implentation. It can scale codes well beyond just a few dozen sockets. Indeed, some key
work going on atm is how to scale relevant codes beyond 512 CPUs, not just 32 or 64.
The Cosmos installation is one such example. Calling a UV a cluster is just plain wrong.
Its shared memory architecture means it can handle very large datasets (hundreds of
GB) and extremely demanding I/O workloads; no conventional 'cluster' can do that.
Ian.
Brutalizer - Sunday, February 23, 2014 - link
Have you read about the ScaleMP Linux server (it has 8192 cores or even more) in my link above? It also has a shared memory system, running a single Linux kernel image. They solve the scalability problem by using a software hypervisor that tricks Linux into believing it is running on a SMP server, and not a cluster. If you read the post in that link, a programmer writes:"...I tried running a nicely parallel shared memory workload (75% efficiency on 24 cores in a 4 socket opteron box) on a 64 core ScaleMP box with 8 2-socket boards linked by infiniband. Result: horrible. It might look like a shared memory, but access to off-board bits has huge latency...."
Thus, the huge ScaleMP Linux server is only good for workloads where each node runs independent code, with little interaction to other nodes - that is the hallmark of HPC number crunching stuff running on clusters.
@mapesdhs: "...Calling a UV a cluster is just plain wrong..."
Regarding if the SGI UV server is a cluster or not: there is a very litmus test to find out if it is really a cluster, or not. Is the SGI UV server used for HPC workloads or SMP workloads? SGI themselves says it is only for HPC workloads. As does ScaleMP.
If you want to prove I am wrong, and your claim is correct: show us links to customers running large SMP workloads on the SGI UV cluster. Hint: you will not find a counter example. Why?
1) SGI says they are not going to try SMP workloads (which is odd, as there is the really big money)
2) It uses MPI, which is a library used for HPC number crunching. I myself has programmed MPI for scientific computations, and I tell you, that you can not rewrite Oracle Database or DB2 or MySQL using MPI without great effort. MPI is for sending code to nodes for execution. A large SMP server does not need MPI libraries or something, it is just programmed as a usual server.
So, instead of you telling me I am wrong, I suggest you just show us links with SMP workloads for the SGI UV2000 server - which is the easiest thing to settle this question. I have showed links where SGI says their big Altix server is not for SMP workloads, it is only for HPC - which means: cluster. If you can show that many customers are replacing large Unix 32 socket servers with SGI UV2000 servers - then you are right, and I am wrong. And I will shut up.
Have you not thought about why the old mature Unix servers are still stuck at 32 or 64 sockets, whereas Linux servers exists in configurations 1-8 sockets or 100s of sockets - but nothing in between? Answer: the 1-8 socket Linux servers are just ordinary x86 servers, and they are great for SMP workloads such as SAP or ERP or whatever. The 100s socket Linux servers are all clusters. There are no 32 socket Linux SMP servers for sale - and has never been. Linux scales bad on SMP workloads, the maximum is 8-sockets. If you check 8-socket Linux benchmarks, the cpu utilization is quite bad. For isntance SAP benchmarks shows Linux having ~88% cpu utilization whereas Solaris has 99% cpu utilization. Solaris scales much better on as few as 8-socket x86 servers, where Linux has problems. That is the reason Solaris has higher perfomance on SAP benchmarks, although the Linux server used faster CPUs, and faster RAM dimms.
Why does the 32 socket Unix servers cost much more than the largest SGI server configuration? Answer: because SGI is a cluster, consisting of X cheap nodes.
Here is a good example of Linux kernel devs ignorant of scale-out and scale-up:
http://vger.kernel.org/~davem/cgi-bin/blog.cgi/200...
"... And here's the punch line, Solaris has never even run on a 1024 cpu system let alone one as big this new SGI system, and Linux has handled it just fine for years. Yet, ZFS creator Jeff Bonwick feels compelled to imply that Linux doesn't scale and Solaris does. To claim that Solaris is more ready to scale on large multi-core systems is pure FUD, and I'm saddened to see someone as technically gifted as Jeff stoop to this level....Now, this all would be amusing if this were the early 90's and us Linux folk were "just a bunch of silly hobbyists." Yet these Solaris guys behave as if we're still in that era."
He clearly has no clue of SGI being a cluster running HPC workloads, whereas Solaris runs SMP workloads on 32/64 socket servers. In fact, decades ago, there was a 144 socket Solaris server. In 2015, Oracle will release a 16.384 thread server with 64TB RAM. The point is: SPARC is doubling performance every generation, whereas Intel is not. SPARC T4 were the worlds fastest cpu in Enterprise database workloads two years ago, and last years SPARC T5 servers are four times as fast as T4. This year, SPARC T6 will arrive, which will be twice as fast again.
There is no chance in hell Intel will match Unix servers on large workloads. 8-socket Intel x86 servers can never compete with 32 or 64 socket Unix servers.
NikosD - Monday, February 24, 2014 - link
@Brutalizer (and Johan)Very interesting comments, but I would like to ask you, what about Itanium (9500 series) ?
I think that Intel keeps 8+ sockets for Itanium series which are capable of up to 32-socket systems.
I can't really answer if there wasn't Itanium, if Intel could build a 32-socket x86-64 system.
BTW, I can't find Enterprise benchmarks for top 9500 Itanium series, like 9560.
Which is the performance of a 32-socket Itanium 9560 system (for example Superdome 2 or other) compared to Oracle SPARC M5/M6-32 or an IBM equivalent ?
Also would be interesting a direct comparison of an 8-socket Itanium 9560 system with an 8-socket Xeon E7 v2 system, to see the internal competition of the two platforms.
Brutalizer - Monday, February 24, 2014 - link
Itanium has very bad performance, as it is not actively developed anymore. Even back then, Itanium had bad performance. Itanium had better RAS than performance. HP provided the RAS capabitlites from their HP-UX servers (PA-RISC cpus). And now Intel has learned some RAS from HP, and Intel is trying to tuck on RAS onto x86 instead, and killing off Itanium. Intel learned RAS. But the x86 RAS is not good enough yet. They can not replay faulty instructions, can not compare output of several cpus and shut faulty cpus down, etc.But Itanium exists in the 64 socket HP Integrity (or is it Superdome) servers, running HP-UX. This is the Big Tux server, I wrote about.
Intel wants desperately go to the high end 16/32 socket servers, where the big money is. But x86 lacks scalability, and has been stuck at 8-sockets forever. Also, the operating system needs to be mature and optimized for 16 or 32 sockets - Linux is not. Because there are no such large Linux servers, it can not be optimized for 16 or 32 socket SMP servers. Only recently, people have been trying to compile Linux to the old mature Unix servers, with bad results. It takes decades to scale well. Scalability is very difficult. Otherwise Intel would be selling 16 or 32 socket x86 servers raking in the BIG money.
But, this x86 E7 cpu is nice, definitely. But to say it will compete in the high end is ridiculous. You need 32 or 64 sockets for that, and at least 32TB RAM. And you need extreme RAS.
Kebab
NikosD - Monday, February 24, 2014 - link
You seem to describe a complete dead end for Intel.Because if x86 is inherently limited to 8-socket scalability and has no luck with extreme RAS, then Intel's decision was right to invest to another ISA for high end servers, like EPIC.
But if Itanium is a low performance CPU, even with high RAS, then Intel is doomed.
I can't see how could penetrate into the top high end systems where the big money is.
Kevin G - Monday, February 24, 2014 - link
Intel has announced that several major RAS features from Itanium will make its way to x86 systems. The main thing is integrated lock step and processor hot swap. These two features can be found on specific x86 servers that provide the additional logic for these.Similarly, x86 can scale beyond 8 sockets with additional glue logic. This is the same for Itanium and SPARC.
Brutalizer - Tuesday, February 25, 2014 - link
"...Similarly, x86 can scale beyond 8 sockets with additional glue logic. This is the same for Itanium and SPARC...."Yes, there is in fact a 16-socket x86 server released last year by Bullion. It has quite bad performacne, and the cpu utilization is not that good I guess because of the bad perofrmance. If Intel is going to scale beyond 8-sockets, they need to do it well, or nobody is going to use them for SMP work, when they can buy a 8 or 16 socket SPARC or POWER server, with headroom for growth to 32 or 64 sockets.
Kevin G - Tuesday, February 25, 2014 - link
Except you are intentionally ignoring the glue logic that SGI has developed for the UV2000.With Intel phasing out Itanium, their x86 Xeon line is picking up where Itanium left off. It does help that both the recent Itaniums and Xeons are using QPI interconnects as the glue logic developed for one can be used for the other architecture. (I haven't seen this confirmed but I'm pretty sure that the glue logic for the SGI UV1000 was originally intended to be used with Itaniums.)
Brutalizer - Tuesday, February 25, 2014 - link
x86 is not inherently limited to 8-socket scalability. Intel needs to develop techniques to scale beyond 8 sockets - which is very difficult to do. Even if Intel scales above 8-sockets, Intel needs to scale well and utilize all resources - which is difficult. So, with time, we might see 16-socket x86 servers. And in another decade maybe 24 socket x86 servers. But the mature Unix OSes has taken decades to go into 32 socket arena, and they started big, always had big servers as the target. Intel starts from desktop, and trying to venture into larger servers. Intel may succeed given time. But not today.xakor - Tuesday, February 25, 2014 - link
Are you sure 16+ sockets is the only route to SMP workloads? It seems to me Intel is choosing the direction of not increasing socket count but rather core count, what does that sound like in term of feasibility? What can be said about 72 core 3TFlop Xeon Phi behind a large ERP?stepz - Wednesday, February 26, 2014 - link
Xeon Phi is utterly crap at database/ERP workloads. Extracting memory level parallelism is required to get good per-thread performance in database workloads, and you need a pretty good OoO core to extract it. Xeon Phi is designed for regularly structured computation kernels and will be multiple times slower than a big core. You cannot compensate that with more cores either because the workloads do still have scalability limits where you hit into nasty contention issues. To get even to the level of scalability they currently have has taken blood, sweat and tears on all levels of the stack, from the kernel, through the database engine to the application code using the database.Kevin G - Monday, February 24, 2014 - link
Even with Itanium's poor performnace, it doesn't stop you from citing the Big Tux experiment to slander overall Linux performance.Brutalizer - Tuesday, February 25, 2014 - link
The reason I cite Big Tux, is because that is the only benchmarks I have seen for Linux running on 64 sockets. If you have other benchmarks, please link to them so I can stop refer to Big Tux.I have never attributed Linux bad performance on Big Tux, because the Itanium has poor performance. I attribute Linux bad performance on Big Tux, because of this: Linux had ~40% cpu utilization on 64 socket Big Tux Itanium server. This means every other cpu idles under full load when using Linux. Is this bad or not? This has nothing to do with Itanium. If Linux ran 64 socket SPARC or POWER - it would still idle ~40%.
Thus, my conclusion of Linux bad performance, is because of the low cpu utilization. It has nothing to do with how fast or slow the hardware. Instead, how good does Linux utilize all resources on large servers? Answer: very bad.
Talking about slandering Linux, have you read this from a prominent Linux kernel developer?
http://vger.kernel.org/~davem/cgi-bin/blog.cgi/200...
"...And here's the punch line, Solaris has never even run on a 1024 cpu system let alone one as big this new SGI system, and Linux has handled it just fine for years. Yet Mr. Bonwick feels compelled to imply that Linux doesn't scale and Solaris does. To claim that Solaris is more ready to scale on large multi-core systems is pure FUD, and I'm saddened to see someone as technically gifted as Jeff stoop to this level..."
Who is slandering who? Is it FUD to say that Linux has scalability problems over 8 sockets? Is it FUD to say that there has never been a 32 socket Linux server for sale? Or is it just that he is not aware of different types of scalability: clusters or SMP servers? Is it just pure ignorance, when he believes a 4096 core Linux cluster can replace a 32 socket SMP server? What do you think? Is it FUD when the ZFS creator claims that Linux does not scale on 32 socket servers, or is it in fact a true claim? Who is FUDing who?
Kevin G - Tuesday, February 25, 2014 - link
Linux scales just as well as Unix on large socket counts. Case in point are IBM's own benchmarks on their p795 systems with 32 sockets, 256 cores and 1024 threads: AIX only beats Linux by a mere 2.7% Source: http://www-03.ibm.com/systems/power/hardware/795/p...I should also point out that your link is 7 years old. Things have changed in the Linux kernel.
hoboville - Monday, February 24, 2014 - link
Well you're right, but it's not as bad for x86 as you make it sound. Systems like TITAN were examples of scale-out compute, if ever there was one. I'll grant it's not the same in terms of what they calculate (Titan is simulation focused and GPU focused) and less on pure RAS and rapid DB access like ERP (not transactional / real time). But that's essentially irrelevant. The point is how they scale in terms of number of nodes and the cost of nodes.Intel's newest chip is cool, but not practical in terms of price competition (why Titan used more Opteron nodes instead of Xeon, for example). What you're focused on is price competition at the ultimate upper end of the spectrum, where SPARC and Power live. And that, in turn, the price of the highest end single system. Intel may be trying to break into that space, but no, it doesn't make sense because x86 wasn't designed for it as an architecture. Their single systems won't compete, yet.
But that's not to say this new Xeon irrelevant. It isn't. It will, however, have problems because of the price-per-performance isn't competitive. In a scale-out design you want more, cheaper nodes and beat the competition by volume. These nodes are just too expensive when you want performance per dollar.
What most mid-to-large companies need is a scalable setup that grows with their business. A lot of IT is bean counting and cost cutting. If you want to start SMP, you start small and tack on additional systems, because your budget people won't let you get a SPARC system or Unix setup. Oracle just doesn't offer systems or prices that are reasonable, and because of this, many businesses that SMP won't give them a second glance. This is where x86 and Xeon fit into the picture, scale out, starting small and building up. But these new systems are asking too much and people aren't going to be interested.
Kevin G - Monday, February 24, 2014 - link
Intel has effectively killed off the Itanium. The original 22 nm Kitson has been scrapped and the successor to Poulson is going to be on 32 nm as well. After that, nothing appears on Intel's roadmap for the chip.HP, the largest Itanium customer, has already announced that their NonStop mainframe line is moving to x86:
Kevin G - Monday, February 24, 2014 - link
Forgot the link: http://h17007.www1.hp.com/us/en/enterprise/servers...Kevin G - Monday, February 24, 2014 - link
"So, instead of you telling me I am wrong, I suggest you just show us links with SMP workloads for the SGI UV2000 server... then you are right, and I am wrong. And I will shut up."United States Post Office running Oracle Data Warehouse software on a SGI UV1000 (the older sibling of the UV2000, still shared memory and cache coherent):
https://www.fbo.gov/index?s=opportunity&mode=f...
SGI and MarkLogic for Big Data:
http://www.v3.co.uk/v3-uk/news/2216603/sgi-and-mar...
I've also found passing references other government (No Such Agency?) installations of a UV2000 installation running Hadoop.
Brutalizer - Tuesday, February 25, 2014 - link
But please, Kevin G, dont you know that Hadoop is a clustered solution? Why do you think people are running clustered database solutiosn as Hadoop on a SGI UV2000 server? Is it because SGI says it is for clustered benchmarks only?And yes, there are clustered databases.
Kevin G - Tuesday, February 25, 2014 - link
Did you not see the link where the USPS is running Oracle workloads on a UV1000? I'll post it again so that you may see: https://www.fbo.gov/index?s=opportunity&mode=f...Kevin G - Tuesday, February 25, 2014 - link
There a couple of reasons why someone would have to run Hadoop on a UV2000: the UV2000 has a large global address space which data could directly reside (ie. no disks access necessary!). If the raw data can reside in 64 TB, performance should be very good. Secondly, Hadoop is free under the Apache license. Traditional database software like Oracle charge a premium the more sockets there are installed on a system. I'd imagine that 256 socket UV2000 system would incur an Oracle licensing fee in the tens of millions of US dollars. So between the choice of free or tens of millions of dollars, most organizations would at least try to work with the free solution.Kevin G - Monday, February 24, 2014 - link
"Thus, the x86 does not scale above 8-sockets."The SGI UV2000 is a fully cache coherent server that scales up to 256 sockets. It uses some additional glue logic but this is no different than what Oracle uses to obtain similar levels of scalibility.
"Some examples of Scale-out servers (clusters) are all servers on the Top-500 supercomputer list. Other examples are SGI Altix / UV2000 servers... Scale-up servers, are one single fat huge server."
SGI correctly classifies these as scale-up servers as they are not a cluster. ( http://www.sgi.com/products/servers/uv/ )
"The reason is the Linux kernel devs does not have access to 32 socket SMP server, because they dont exist, so how can Linux kernel be optimized for 32 sockets?"
"Ted Tso, the famous Linux kernel developer writes:
http://thunk.org/tytso/blog/2010/11/01/i-have-the-... "
Oh wow, you're confusing a file system with the kernel. You do realize that Linux has suport for many different file systems? Even then Ext4 is actually shown to scale after a few patches per that link. Also of particular note is that 4 years ago when that article was writen Ext4 was not suited for production purposes. In the years since, this has changed as has its scalability.
"For instance the Big Tux HP server, compiled Linux to 64 socket HP integrity server with catastrophic results, the cpu utilization was ~40%, which means every other cpu idles under full load. Google on Big Tux and read it yourself."
Big Tux was an ancient Itanium server that was constrained by equally ancient FSB architecture. Even with HP-UX, developers are lucky to get high utilization rates due to the quirks of Itanium's EPIC design.
"SGI servers are only used for HPC clustered workloads, and never for SMP enterprise workloads:
http://www.realworldtech.com/sgi-interview/6/ "
Readers should note that this link is a decade old and obviously SGI technology has changed over the past decade.
"Thus, this Intel Xeon E7 cpu are only used up to 8-sockets servers. For more oomph, you need 32 socket or even 64 sockets - Unix or Mainframes."
Modern x86 and Itanium chips form Intel only scale to 8 sockets without additional glue logic. This is similar to modern SPARC chips from Oracle which need glue logic to scale past 8 sockets. IBM is the only major vendor which does not use glue logic as the GX/GX+/GX++ use a multi-tiered ring topology (one for intra-MCM and one for inter-MCM communication).
"Another reason why this Intel Xeon E7 can not touch the high end server market (beyond scalability limitations) is that the RAS is not good enough."
Actually Stratus offers Xeon servers with processor lock step: http://www.stratus.com/Products/Platforms/ftServer...
x86 servers have enough RAS that HP is moving their NonStop mainframe line to Xeons:
http://h17007.www1.hp.com/us/en/enterprise/servers...
"Thus:
-Intel Xeon E7 does not scale above 8-sockets. Unix does. So you will never challenge the high end market where you need extreme performance. Besides, the largest Unix servers (Oracle) have 32TB RAM. Intel Xeon E7 has only 6TB RAM - which is nothing. So x86 does not scale cpu wise, nor RAM wise."
The new Xeon E7v2's can have up to 1.5 TB of memory per socket and in an 8 socket system that's 12 TB before needing glue logic. The SGI UV2000 scales to 256 sockets and 64 TB of memory. Note that SGI's UV2000's memory capacity is actually limited by the 46 bit physical address space while maintaining full coherency.
"-Intel Xeon E7 has no sufficient RAS, and the servers are unreliable, besides the x86 architecture which is inherently buggy and bad (some sysadmins would not touch a x86 server with a ten feet pole, and only use OpenVMS/Unix or Mainframe):
http://www.anandtech.com/show/3593 "
Nice. You totally missed the point of that article. It was more a commentary on yearly ISA increases in the x86 space and differences between AMD and Intel's implementations. This mainly played out with the FMA instructions between AMD and Intel (AMD supported 4 operand FMA in Bulldozer where as Intel supported 3 operand FMA in Sandybridge. AMD's Piledriver core added support for 3 operand FMA.) Additionally, ISA expansion should be relatively rare, not a yearly cadence to foster good software adoption.
ISA expansion has been a part of every platform so by your definition, everything is buggy and bad (and for reference, IBM's z/OS mainframes have even more instructions than x86 does).
"-Oracle is much much much much cheaper than IBM POWER systems. The Oracle SPARC servers pricing is X for each cpu. So if you buy the largest M6-32 server with 32TB of RAM you pay 32 times X. Whereas IBM POWER systems costs more and more the more sockets you buy. If you buy 32 sockets, you pay much much much more than for 8 sockets."
This came out of no where in the conversation. Seriously, in the above post, where did you mention pricing for POWER or SPARC systems? Your fanboyism is showing. I think you cut/paste this from the wrong script.
Brutalizer - Tuesday, February 25, 2014 - link
Regarding my link about Ted Tso, talking about filesystems. You missed my point. He says explicitly, that Linux kernel developers did not have access to large 48 core systems. 48 cores, translates to... 8-sockets. I tried to explain this in my post, but apparently failed. My point is, if prominent Linux kernel developers think 8-socket servers are "exotic hardware" - how do well do you think that linux scales on 8-sockets? No Linux developer has such a big server with 8-sockets to optimize Linux to. Let alone 16 or 32 sockets. I would be very surprised if Linux scaled well beyond 8-sockets without even optimizing for larger servers.Then you talk about how large the SGI UV2000 servers are, etc etc. And my link where SGI explains that their predecessor Altix server is only suitable for HPC workloads - is rejected by you. And the recent ScaleMP link I showed, where they say it is only used for HPC workloads - is also rejected by you I believe - on what grounds I dont know. Maybe because it is 2 years old? Or the font on the web page is different? I dont know, but you will surely find something to reject the link on.
Maybe you do accept that the SGI Altix server is a cluster fit for HPC workloads, as explained by SGI? But you do not accept that the UV2000 is a successor to Altix - but instead the UV2000 server is a full blown SMP server somehow. When the huge companies IBM and Oracle and HP are stuck at 32 sockets, suddenly, SGI has no problems scaling to 1000s of cpus for a very cheap price. You dont agree something is a bit weird in your logical reasoning?
Unix: lot of research during decades from the largest companies: IBM, Oracle and HP - are stuck at 32 sockets, after decades of research. Extremely expensive servers, one single 32 socket server at $35 million.
Linux: No problemo sailing past 32 sockets, hey, we talk about 100.000s of cores. Good work by the small SGI company (the largest UV2000 server has 262.144 cores). And also, same work by the startup ScaleMP - also selling 1000s of sockets. For a cheap price. But hey, why being modest and stop at quarter million of cores? Why not quarter million sockets? Or a couple of millions?
There is no problem here? What the three largest companies can not do, under decades, SGI and ScaleMP and other Linux startups has no problem with? Quarter of million of cores? Are you sh-tting me? Do you really believe it is a SMP server, used for SMP workloads, even though both SGI and ScaleMP says their servers are for HPC clustering workloads?
Brutalizer - Tuesday, February 25, 2014 - link
And how do you explain the heavy use of HPC libraries such as MPI in the UV2000 clusters? You will never find MPI in an enterprise business system. They are only used for scientific computataions. And SMP server does not use MPI at all, didnt you know?http://www.google.se/url?sa=t&rct=j&q=&...
Kevin G - Tuesday, February 25, 2014 - link
Very simple: MPI is a technique to ensure data locality for processing regardless it is if a cluster or a multi-socket system. It reduces the number of hops data has to traverse regardless if it is a SMP link between sockets or a network interface between independent systems. Fewer hops means greater efficiency and greater efficiency equates to greater throughput.Also if would have actually read that link you'd have realized that the UV2000 is not a cluster. It is a fully coherent system with up to 64 TB of globally addressable memory.
Kevin G - Tuesday, February 25, 2014 - link
"Regarding my link about Ted Tso, talking about filesystems. You missed my point. He says explicitly, that Linux kernel developers did not have access to large 48 core systems. "A lot of Linux developers are small businesses or individuals as is the beauty of open source software - everyone can contribute. It also means that not everyone will have equal access to resources. There are some large companies that invest heavily into Linux like IBM. They have managed to tune Linux to get to 2.7% the performance of AIX on their 32 socket, 256 core, 1024 thread p795 system in SPECjbb2005. Considering the small 2.7% difference, I'd argue that Linux scales rather well compared to AIX.
"Then you talk about how large the SGI UV2000 servers are, etc etc. And my link where SGI explains that their predecessor Altix server is only suitable for HPC workloads - is rejected by you."
Yes and rightfully so because you're a decade old link to their predecessor that has a different architecture.
"But you do not accept that the UV2000 is a successor to Altix - but instead the UV2000 server is a full blown SMP server somehow. When the huge companies IBM and Oracle and HP are stuck at 32 sockets, suddenly, SGI has no problems scaling to 1000s of cpus for a very cheap price. You dont agree something is a bit weird in your logical reasoning?"
Not at all. SGI developed the custom glue logic, NUMALink6, to share memory and pass coherency throughout 256 sockets. Oracle developed the same type of glue logic for SPARC that SGI developed for x86. Only thing noteworthy here is that SGI got this type of technology to market first in their 256 socket system before Oracle could ship it in their 96 socket systems. The source for this actually comes from a link that you kindly provided: http://www.theregister.co.uk/2013/08/28/oracle_spa...
And for the record, IBM has a similar interconnect as well for the POWER7. The thing about the IBM interconnect is that it is not cache coherent across the glue logic, though the 32 dies on one side of the glue are fully cache coherent. The main reason for loosing coherency in this topology is the physical address space of the POWER7 can exceeded at which point coherency would simply fail anyway. All the memory in these systems is addressable through the virtual memory though. Total number of dies is 16384, 131,072 cores, and 524,288 threads. Oh, and this system can run either AIX or Linux when maxed out. Source: http://www.theregister.co.uk/Print/2009/11/27/ibm_...
So really, all the big players have this technology. The differences are just how many sockets a system can have before this additional glue logic is necessary, how far coherency goes and the performance impact of the additional traffic hops the glue logic adds.
"There is no problem here? What the three largest companies can not do, under decades, SGI and ScaleMP and other Linux startups has no problem with? Quarter of million of cores? Are you sh-tting me? Do you really believe it is a SMP server, used for SMP workloads, even though both SGI and ScaleMP says their servers are for HPC clustering workloads?"
The SGI UV2000 fits all the requirements for a big SMP box: cache coherent, global address space and a single OS/hypervisor for the whole system. And as I mentioned earlier, both IBM and Oracle also have their own glue logic to scale to large number of cores.
As for the whole 'under decades' claim, scaling to large numbers of cores hasn't been possible until relatively recently. The integration of memory controllers and point-to-point coherency links has vastly simplified the topology for scaling to a large number of sockets. To scale efficiently with a legacy FSB architecture, the north bridge chip with the memory controller would need to have a FSB connection to each socket. Want 16 sockets? The system would need 16 FSB stemming off of that single chip. Oh and for 16 sockets the memory bandwidth would have to increase as well, figure one DDRx channel per FSB. That'd be 16 FSB links and 16 memory channels coming off of a single chip. That is not practical by any means. IBM in some of their PowerPC/POWER systems used a ring topology before memory controllers were integrated. Scaling there was straightforward: just had more hops on the ring but performance would suffer due to the latency penalty for making each additional hop.
As for what the future holds, both Intel and IBM have been interested in silicon photonics. By directly integrating fiber connections into chip dies, high end Xeons and POWER chips respectively will scale to even further heights than they do today. By ditching copper, longer distances between sockets can be obtained with a signal repeater, a limiting factor today.
BOMBOVA - Tuesday, February 25, 2014 - link
Yes you are insightful, . learned, and express yourself with linearity, " your a teacher " thanks, but where are you other thoughts ? Cheers from Thomas in Vancouver Canadahelixone - Tuesday, February 25, 2014 - link
The E7 v2 family of processors should give Intel a seat at the scale-up table, with architectural support for 15 cores/socket, 32 socket systems and 1.5 TB RAM per socket. IE: A single system with 480 fat cores and 48TB RAM.Sure, they aren't going to take the top of the scale-up charts with this generation, but they should have another belly-busting course of eating into the remaining Sparc, Power and (yes) Itanium niches. (It's only a matter of time until scale-up will be owned by Intel, with all other architectures being in decline.. IE: Oracle, and IBM will only be able to justify so much development into a lagging platform.)
Personally, I am curious if in 15-20 years we'll be talking about ARM64 servers taking on/out the legacy x86 scale-up servers.
Nenad - Thursday, February 27, 2014 - link
Intel based servers can scale over 8 CPUs. While you seem very biased toward "big iron", it should be noted that each vendor have some proprietary solution to connect multiple sockets. And Intel is offering non-proprietary way to connect up to 8 sockets. Above that you can use same approach as "big iron" Oracle/IBM solutions and offer proprietary interconnect of groups of 8xIntel CPU. Even IBM used to do that - I was working with Intel based servers with much more CPU sockets that maximal 4 sockets supported back then by Intel. Those servers used proprietary IBM interconnect between boxes each containing 4 sockets (I think each CPU had 4 cores then), 32GB RAM and I/O.While using two such boxes instead of one will not result in linear performance improvement (box interconnect is slower than link between inner 8 sockets), such servers use OS that support NUMA architecture (Non uniform memory access) to reduce between-box communications. In addition, many enterprize applications are optimized for such NUMA scenarios and scale almost linearly. We used Windows as OS (support NUMA) and MS SQL as enterprise app (support NUMA), and scalability was excellent even above native Intel 4/8 sockets.
And nowdays such Intel based servers are even better, with 8 CPUs (=120 cores) and 6TB RAM PER BOX, multiply with number of boxes you use.
End result: even without linear scaling , multi-box Intel servers can outperform IBM/Oracle servers while costing less. Your "only UNIX can scale up" comment is clearly wrong - what really keep UNIX/IBM/Oracle in enterprise is not scale-up ability, it is software that was historically made for those OS. Not to mention that enterprises are VERY conservative ("can you show us 10 companies bigger than us, in our region, that use that Windows/Intel for main servers? No? Then we will stay at UNIX or IBM - noone was ever fired for choosing iBM after all ;p" - but even that is slowly changing , probably because they can see those "10 companies")
Pox - Wednesday, March 12, 2014 - link
On the plus side, for Linux development, older 32 and 64 socket mainframes can now be had fairly cheap relative to their "new" pricing. This will aid the ongoing scaling development of Linux. You can grab a Superdome for under 10k, but you will still have to fill the server and additional cabinets with cells, processors and memory. But all in all, they are getting much easier to afford in the broker market.Phil_Oracle - Monday, February 24, 2014 - link
Need to ask yourself: Why is it that IBM hasn’t published any benchmarks in 3+ years except for certain corner cases? When IBM released Power7, they released benchmarks across every benchmark out there from TPC-C, TPC-H, SPECjbb2005, SPEC_OMP, SPEC CPU2006, SAP, etc. When Power7+ came out, there were only non I/O based benchmarks released. No DB benchmarks, no STREAM benchmark,etc. So maybe Oracle had no choice but to compare against 3-year old results? And why hasn’t IBM published newer results? Maybe because Power7+ is less than a 10% improvement? That’s what IBM's own rPerf metric tells us.Kevin G - Monday, February 24, 2014 - link
With POWER8 due out later this year, I suspect they'll be updating their old benchmarks with the newer hardware.The real question is why hasn't IBM ever submitted benchmarks for their z-series mainframes? Performance data there is very lacking. Though z-series costumers tend to fall into two groups: legacy mainframe applications and those who desire ultimate RAS regardless of the performance.
Phil_Oracle - Tuesday, February 25, 2014 - link
Yes, we shall see what Power8 delivers and when.. Its already a year late according to IBM's "3-year cadence". Power7 is 4 years old this month! As for Mainframe, its not about performance, it’s about uptime but at some point, you can get uptime through clustering and redundancy and then performance becomes the issue. We once did a POC comparing performance of latest Mainframe vs SPARC M6 and we estimated SPARC M6-32 to be 2-3x higher MIPs! as you can imagine, customer is migrating.Kevin G - Tuesday, February 25, 2014 - link
Everyone has been suffering delays with chips it seems. Intel even with their process advantage looks to be a 9 month to a year beyond schedule for their 14 nm roll out. IBM/TSMC/GF/Samsung are similarly behind in their roll out of 22/20 nm class logic.There has been a desire for ages to get off of mainframes in some industries. Reliability is 'good enough' and performance is better but the reason some don't migrate is simply software costs. I used to work in such a shop and the mainframe hanged around due to the extensive cost of porting and validating all the legacy software. Also 'if it ain't broke, don't fix it' was a theme at that place and well, the mainframe was never broken. I figure that many main frame shops fall into that category.
A decked out M6-32 out running a mainframe in some tests by 2x within reason for some CPU tests. I'm more curious as to what specific workloads they were. In IO bound tests, the mainframe is still competitive due to raw amount of coprocessors and dedicated hardware thrown into the niche. Flash in the enterprise have helped narrowed the IO gap significantly but I don't think it has managed to surpass the ancient mainframe architecture.
PowerTrumps - Monday, February 24, 2014 - link
Probably because most of their numbers have held up by and large to the competition. Unlike Sun SPARC and now Oracle SPARC which had disappeared from the benchmark scene for years with T1-T3 and most Fujitsu based servers. Oracle had cherry picked obscure benchmarks with T4 and now with T5 they have had a lot to make up. So, although you make it sound impressive let's not forget the past and the gap that needed to be filled.Phil_Oracle - Tuesday, February 25, 2014 - link
I'm a 15 yr Sun veteran now at Oracle so yes, I agree that in past, with older generation SPARC, especially the first generation T-Series, Sun only benchmarked where the T-Series did well and avoided benchmarks where it didn't as it was designed for web tier workloads. That was 5 generations ago! But that’s my point. A vendor isn't going to publish a poor or worse looking result that previous version so every vendor "cherry picks" as you say, Not having a benchmark tells me that either the previous version is better, new version isn't that much better or its worse (whether in throughput, per/core, etc). In any case, the more benchmarks, the better sign that its leading.. And while SPARC T4 was really the first Oracle SPARC developed processor, it caught up to competing CPUs, and with SPARC T5 and even SPARC M6, its hard to argue that SPARC T5 is not leading. With 16 x cores, 8 x threads/core @ 3.6GHz, and glue less scalability to 8-sockets, and SPARC M6 @ 12-cores, 8 x threads/core up to 32-sockets and now almost a year old, Intels latest Xeon Ivybridge-EX has finally caught up, but in certain areas, like DB and middleware performance, still lacking in benchmark proof points to show its superior. And as for Power8, well, we'll just have to wait and see what the systems will deliver and when. Clearly they are aiming at SPARC for high end, now that Itanium is all but dead, and on entry-mid range, competing against Xeon.thunng8 - Friday, February 21, 2014 - link
Great for intel that they have finally marginally overtaken a several year old IBM box in the sap sd benchmark. Only trouble is the 2.5x faster POWER8 (compared to POWER7) is coming in the next few months.extide - Friday, February 21, 2014 - link
Keep in mind that IBM POWER Chips are typically 200-250W TDP Chips. So yeah, on a performance per watt scale, these are quite impressive!Kevin G - Friday, February 21, 2014 - link
POWER7 is 200W and POWER7+ is 180W. Still higher than Intel but not as bad as you'd think.JohanAnandtech - Saturday, February 22, 2014 - link
Do you have a source for that? It is pretty hard to find good info on those CPUs. Or I have missed it somehow.Kevin G - Saturday, February 22, 2014 - link
IBM, like Intel, bins chips by power consumption. It looks like there are indeed 250W POWER7's but they do scale down to 150W.800W MCM for super computing, 200W POWER7 die @ 3.83 Ghz
http://www.theregister.co.uk/Print/2009/11/27/ibm_...
The final shipping speed was 3.83 Ghz which falls into the 3.5 to 4.0 Ghz range target in the article.
250W for high end boxes & 150W for blade systems:
http://www.realworldtech.com/forum/?threadid=12393...
Note that this was an early IBM paper and that 300W per socket figure could have been provisioning for future dual die POWER7+ modules
250W for POWER7 @ 4.0 Ghz and 250W for POWER7+ @ 4.5 Ghz:
http://www-05.ibm.com/cz/events/febannouncement201...
I'm trying to find the source to the 180W POWER7+ figure. The difficulty is that it appeared in a discussion about Intel's Poulson Itanium which consumes 10W less.
Kevin G - Saturday, February 22, 2014 - link
Not 100% sure since I'm not an IEEE member to view it, but this paper maybe the source for the POWER7+ figures:http://ieeexplore.ieee.org/xpl/articleDetails.jsp?...
Phil_Oracle - Monday, February 24, 2014 - link
TDP is great for comparing chip to chip, but what really matters is system performance/watt. And although Intel's latest Xeon E7 v2 may have better TDP specs than either Power7+ or SPARC T5, when you look at the total system performance/watt, SPARC T5 actually leads today due to its higher throughput, core count, 4 x more threads, built-in encryption engines and higher optimization with the Oracle SW stack.Flunk - Friday, February 21, 2014 - link
8 core consumer chips now please. If you have to take the GPU off go for it.DanNeely - Friday, February 21, 2014 - link
Assuming you mean 8 identical cores, until mainstream consumer apps appear that can use more CPU resources than the 4HT cores in Intel's high end consumer chips but which can't benefit from GPU acceleration become common it's not going to happen.I suppose Intel could do a big.little type implementation with either core and atom or atom and the super low power 486ish architecture they announced a few months ago in the future. But in addition to thinking it was worthwhile for the power savings, they'd also need to license/work around arm's patents. I suppose a mobile version might happen someday; but don't really see a plausible benefit for laptop/desktop systems that don't need continuous connected standby like phones do.
Kevin G - Friday, February 21, 2014 - link
Intel hasn't announced any distinct plans to go this route, they're at least exploring the idea at some level. The SkyLake and Knights Landing are to support the same ISA extensions and in principle a program could migrate between the two types of cores.StevoLincolnite - Saturday, February 22, 2014 - link
Er. You don't need apps to use more than 4 threads to make use of an 8 core processor.Whatever happened to running several demanding applications at once? Surely I am not the only one who does this...
My Sandy-Bridge-E processor being a few years old is starting to show it's age in such instances, I would cry tears of blood for an 8-Core Haswell based processor to replace my current 6-core chip.
psyq321 - Monday, March 10, 2014 - link
Well, you can buy bigger Ivy Bridge EP Xeon CPU and fit it in your LGA2011 system.This way you can go up to 12 cores and not have to wait for 8-core Haswell E.
SirKnobsworth - Friday, February 21, 2014 - link
8 core Haswell-E chips are due out later this year. You can already buy 6 core Ivy Bridge-E chips with no integrated graphics.TiGr1982 - Friday, February 21, 2014 - link
Did you know:Haswell-E is supposed to be released in Q3 this year, to have up to 8 Haswell cores with HT, fit in the new revision of Socket LGA2011 (incompatible with the current desktop LGA2011), and work with DDR4 and X99 chipset. No GPU there, since it's a byproduct of server Haswell-EP.
Harry Lloyd - Friday, February 21, 2014 - link
That will not help much, unless they release a 6-core chip for around 300 $, replacing the lowest LGA2011 4-core chips. It is about time.TiGr1982 - Friday, February 21, 2014 - link
I think, 6 cores on desktop for $300 will NOT happen this year.Because if it will, then you'll get $300 4 core i7 on mainstream 1150 & $300 6 core i7 on new 2011 simultaneously on the market.
To adjust this, they'll have to sell 1150 4 core i7 for $200-$220, like Core i5 now.
This is not realistic, because that's Intel we're talking about, right?...
dragonsqrrl - Friday, February 21, 2014 - link
That's actually the plan, except it won't be $300. I think the latest leaks suggest that the lowest end Haswell-E SKU will be a 6-core K series at ~$400. The other two price points remain about the same, $600 and $1000 for the 8-core SKU's.TiGr1982 - Saturday, February 22, 2014 - link
To me, seems too good to be true. Will require a major change of mindset inside Intel to start selling 6 core for $400 and lower 8 core for $600 :)(while 8 core XE for $1000 is not surprising at all)
Harry Lloyd - Saturday, February 22, 2014 - link
The thing is LGA2011 mobos are really expensive, so the CPU price does not have to be that high. You can get a good B85 mobo even for less than 100 $, and an LGA2011 mobos start at 250 or even 300 $.I would not pay 300 $ for a mobo, and 400 $ for a 6-core CPU, that would still be ridiculous. I hate this stagnation. The transision from 1-core to 4-core happend really quickly.
MrSpadge - Saturday, February 22, 2014 - link
The smallest 6-core K model has been around 500$ for quite some time, so I see no problem going to 400$ this time. 8 cores for 600$ would indeed be a significant step for some, though.psyq321 - Monday, March 10, 2014 - link
Well, if Intel manages to castrate the HEDT "E" version enough so that it does not pose any threat to their Xeon revenue, price drop might happen.However, one factor not to be underestimated is total available market and how much are target consumers for this kind of hardware willing to pay. I have no data, but for some reason I think only small % of "power users" (>very< power users) need 8 cores today and they would probably be willing to shell out $1000.
Thing is, if you are Intel, you will probably making the calculation: what if we drop the price to, say, $600? Is this going to bring us more customers? Is this going to cannibalize some of, more lucrative, Xeon market?
I suppose if Intel fuses out TSX, VT-D, ECC memory support and, of course, QPI (which is what they do anyway with Sandy-E and Ivy-E HEDT CPUs) the chip would practically be next to useless to most Xeon customers. So the remaining issue is the market.
f0d - Friday, February 21, 2014 - link
i agreei was hoping for 8 core ivy bridge-e chips but had to settle for 6 cores which i can easily use all of
i do a LOT of video encoding using handbrake and that program just loves cores, i easily saturate all 12 threads with my settings in handbrake so i do believe it could use a single socket 8 core well (i have read tests that show handbrake not liking dual/quad socket systems for more cores - but does improve when using lots of cores on a single socket)
MT007 - Friday, February 21, 2014 - link
You have a error on page 8, in your fourth paragraph you have the opteron as 2.4ghz and only with a score of 2481. From your graph it should have been 2.3ghz and 2723?webmastir - Friday, February 21, 2014 - link
They don't tend to fix errors/read comments I don't think.JohanAnandtech - Friday, February 21, 2014 - link
Sure we do :-)JohanAnandtech - Friday, February 21, 2014 - link
I don't see the error. "Beckton" (Nehalem-EX, X7560) is at 2.4 GHzmslasm - Sunday, February 23, 2014 - link
> I don't see the error.The article says "The Opteron core is also better than most people think: at 2.4GHz it would deliver about 2481 MIPs." - but, according to the graph, Opteron already delivers 2723 @ 2.3Ghz. So it is puzzling to see that it "would" deliver less MIPS (2481 vs 2723) at higher frequency (2.4 vs 2.3 Ghz) (regardless of any Intel results/frequencies)
silverblue - Saturday, February 22, 2014 - link
It's entirely possible that the score is down to the 6376's 3.2GHz turbo mode.plext0r - Friday, February 21, 2014 - link
Would be nice to run benchmarks against a Quad E5-4650 system for comparison.blaktron - Friday, February 21, 2014 - link
... you know you can't, right?blaktron - Friday, February 21, 2014 - link
Nevermind, read v2 there where you didn't write it. Too much coffee....usernametaken76 - Friday, February 21, 2014 - link
For the more typo-sensitive reader (perhaps both technically astute and typo-senstive):"A question like "Does the SPARC T5 also support both single-threaded and multi-threaded applications?" must sound particularly hilarious to the our technically astute readers."
...to the our...
JohanAnandtech - Friday, February 21, 2014 - link
Fixed. Thx!TiGr1982 - Friday, February 21, 2014 - link
From the conclusion:"The Xeon E7 v2 chips are slated to remain in data centers for the next several years as the most robust—and most expensive—offerings from Intel."
I don't think it will be really "several" years - maybe 1-2 years later this Ivy Bridge-EX-based E7 v2 will probably be superseded by Haswell-EX-based E7 v3 with Haswell cores with AVX2/FMA, which should make a difference in pro floating point calculations and data processing, and working with DDR4.
Kevin G - Friday, February 21, 2014 - link
The Ivy Bridge-EX -> Haswell-EX transition will mimic the Nehalem-EX -> Westere-EX transition in that the core systems provided by the big OEM will stay the same. The OEM's offer Haswell-EX as a drop in replacement to their existing socket 2011v1 systems. Haswell-EX -> Broadwell-EX will again be using the same socket and follow a similarly quick transition. SkyLake-EX will bring a new socket design (perhaps with some optical interconnects?).At some point Intel will offer new memory buffer chips to support DDR4. This will likely require a system to swap out all the memory daughter cards but the motherboard from big OEM's shouldn't change. There may also be a period where these large systems can be initially configured with either DDR3 or DDR4 based upon customer requests.
Kevin G - Friday, February 21, 2014 - link
And a quick addition:There will indeed be a quick adoption to Haswell-EX not because of AVX2 or DDR4 but rather transactional memory support (TSX). For the large databases and applications these systems are targeted at, TSX should prove to be helpful.
TiGr1982 - Friday, February 21, 2014 - link
I agree, TSX should make a lot of sense for these E7's - they have a huge core count and huge shared memory at the same time.Schmide - Friday, February 21, 2014 - link
I think your L3 latency numbers are off. I think typical Intel L3 latencies are 30-40 clocks ~3-4ns.Schmide - Friday, February 21, 2014 - link
Oops my bad i miss used the calculator. Ignore.dylan522p - Friday, February 21, 2014 - link
No power consumption numbers?JohanAnandtech - Saturday, February 22, 2014 - link
Coming...we had to run lots of test in parallel, so it was not possible to make sure all systems were similar. Also we should test with workloads that require a lot more memory to get an idea.mslasm - Friday, February 21, 2014 - link
Note that E7-8857 v2 has 12 cores but no HT, so only has 12 threads as well (see http://ark.intel.com/products/75254/Intel-Xeon-Pro... Thus it is not equivalent to a 3Ghz E7-4860V2, as 4860 has HT for a total of 24 threadsAlso, there must be a typo either in the graph or in the text on the "single thread" integer performance test: "Opteron ... at 2.4GHz would deliver about 2481 MIPs", while - according to the graph - it already delivers 2636 @ 2.3Ghz.
JohanAnandtech - Saturday, February 22, 2014 - link
Good point. There is little gain from HT in OpenFoam, but it will influence the LZMA benchmarks. So the Openfoam findings are still valid, but not the LZMA. The kernel compile is somewhat in between.JohanAnandtech - Saturday, February 22, 2014 - link
I will rerun the benchmarks without HT to check.mslasm - Saturday, February 22, 2014 - link
Thanks! I did not mean to imply HT matters "a lot", but it may influence some (and I admit I don't know much about how your benchmarks behave, other than parallel LZMA which I worked a lot with) - so it just does not sound right to outright call it equivalent, and I wish AT only has statements anyone can just trust :)snoopy1710 - Friday, February 21, 2014 - link
Minor correction on the Dell E7-4890 SAP benchmark, which was done on SUSE Linux Enterprise Server for SAP Applications:http://download.sap.com/download.epd?context=40E2D...
Snoopy
FunBunny2 - Friday, February 21, 2014 - link
you should opt for ubuntu 12.04. "real" databases are approved only for LTS versions, and 12.04 is the latest.bji - Friday, February 21, 2014 - link
Page 10 does not contain the Linux Kernel Compile time benchmarks.JohanAnandtech - Saturday, February 22, 2014 - link
The web engine did something weird...I restored the pageJawsOfLife - Friday, February 21, 2014 - link
Very thorough review, which is what I've come to expect from Anandtech! I am interested but not very knowledgeable about the server side of computing, so this definitely filled me in on a lot of the facets of that area. Thanks for the writeup.By the way, the "Linux Kernel Compile" page is blank, as bji noted.
JohanAnandtech - Saturday, February 22, 2014 - link
Thx. A glitch in the engine made it delete a page. Restored.iwod - Friday, February 21, 2014 - link
While the revenue are high, just how many unit are shipped?I have been thinking if Intel would move Mobile First, meaning Atom, Tablet and Laptop Chips all gets the latest node first, which are low power design. While Desktop and Server will be a Architecture and Node behind. Which will align the Desktop and Xeon E3 - E5 Series.
But it seems the volume of Chips dont quite measure out, since the top end volume are far too small? Anyone have any idea on this.
dealcorn - Saturday, February 22, 2014 - link
I believe the statement "Still, that tiny amount of RISC servers represents about 50% of the server market revenues." should read "Still, that tiny amount of RISC servers represents about 50% of the high end server market revenues." Stated differently, from a revenue perspective Intel is #1 vendor in the high end segment even though it has less than a 50% market share. Server orders are placed with vendors, not architectures. Intel has fought an uphill battle to access the high end market and it is costly. However, if Intel can amortize its development costs over a larger revenue base than any competitor, it is well positioned to maintain it's share acquisition momentum.NikosD - Saturday, February 22, 2014 - link
@JohanVery nice review, I would like to see more benchmarks between E7 v2 vs RISC processors because I think the real competition is there.
Older Intel and AMD servers are not real competition for IvyBridge-EX.
It would be interesting when POWER8 is out, to give us the new figures of 8 socket benchmarks and if there is any progress on more 8+ sockets for Intel E7 v2 (built by Cray and other vendors)
I think that E7 v2 (I don't know about older vendors) can be placed in up to 32-socket systems - not natively of course.
JohanAnandtech - Saturday, February 22, 2014 - link
Older Intel systems are competition, because these kind of servers are not replaced quickly. If a new generation does not deliver substantial gains, some companies will postpone replacement. In fact, very few people that already have a quad intel consider the move to RISC platforms.But you have a point. But it is almost impossible for us to do an independent review of other vendors. I have never seen an independent review, and the systems are too scarce, so there is little chance that we can ask a friendly company to borrow us one.
JohanAnandtech - Saturday, February 22, 2014 - link
I meant, I have never seen an independent review of high-end IBM or SUN systems. We did one back in the T1 days, but the product performed only well in a very small niche.Phil_Oracle - Monday, February 24, 2014 - link
Contact your Oracle rep and I am sure we'd be glad to loan you a SPARC T5 server, which we have in our loaner pool for analysts and press. Would be nice if you had a more objective view on comparisons.Phil_Oracle - Monday, February 24, 2014 - link
If you look at Oracles Performance/Benchmark blog, we have comparisons between Xeon, Power and SPARC based on all publicly available benchmarks. As Oracle sells both x86 as well as SPARC, we sometimes have benchmarks available on both platforms to compare.https://blogs.oracle.com/BestPerf/
Will Robinson - Saturday, February 22, 2014 - link
Intel and their CPU technology continues to impress.Those kind of performance increase numbers must leave their competitors gasping on the mat.
Props for the smart new chip. +1
Nogoodnms - Saturday, February 22, 2014 - link
But can it run Crysis?errorr - Saturday, February 22, 2014 - link
My wife would now the answer to this considering she works for ibm but considering software costs far exceed hardware costs on a life cycle basis does anyone know what the licensing costs are between the different platforms.She once had me sit down to explain to her how CPU upgrades would effect db2 licenses. The system is more arcane and I'm not sure what the cost of each core is.
For an ERP each chip type has a rated pvu metric from IBM which determines the cost of the license. Are RISC cores priced differently than x86 cores enough to partially make up the hardware costs?
JohanAnandtech - Sunday, February 23, 2014 - link
I know Oracle does that (risc core <> x86 core when it comes to licensing), but I must admit, Licensing is extremely boring for a technical motivated person :-).Phil_Oracle - Monday, February 24, 2014 - link
In total cost of ownership calculations, where both HW and SW as well as maintenance costs are calculated, the majority of the costs (upwards of 90%) are associated with software licensing and maintenance/administration- so although HW costs matter, it’s the performance of the HW that drives the TCO. For Oracle, both Xeon and SPARC have a per core license factor of .5x, meaning 1 x license for every two cores, while Itanium and Power have a 1x multiplier, so therefore Itanium/Power must have a 2x performance/core advantage to have equivalent SW licensing costs. IBM has a PVU scale for SW licensing, which essentially is similar to Oracle but more granular in details. Microsofts latest SQL licensing follows similarly. So clearly, performance/CPU and especially per core matters in driving down licensing costs.Michael REMY - Sunday, February 23, 2014 - link
that would have be very good to test this cpu on 3D rendering benchmark.i can imagine the gain of time in a workstation...even the cost will be nearest a renderfarm...
but comparing this xeon to other one in that situation should have bring a view point.
JohanAnandtech - Sunday, February 23, 2014 - link
What rendering engine are you thinking about? Most engines scale badly beyond 16-32 threadscolonelclaw - Monday, February 24, 2014 - link
I would like to see V-Ray benchmarked. It's fast becoming an industry standard across a number of 3D industries (started in ArchVis, is now moving into animation feature films and FX)PowerTrumps - Sunday, February 23, 2014 - link
The author is misleading with statements and data not to mention @Brutalizer comes across very knowledgeable but only backs up claims or Oracle server performance with platitudes and boasts.Starting with the article - comparing various cores regardless if you adjust the frequency is misleading. You need to normalize the values to show what the per core improvement is. To stay with sockets is useless and lazy. Yes, Intel customers buy servers by the socket but to understand what they are really gaining this is a much better metric. To say there is a 20 or 30% gain when there might be 50% more cores tells me the per core performance is actually lower than Westmere. This is important when using software like Oracle that would price a 15 core socket at 7.5 or 8 Oracle licenses. For software licensed by the core, customers should demand the highest performance available otherwise all you do is subsidize Uncle Larry's island. For the Power comparisons in the SAP benchmarks. You compare a 60 core to a 32 core N-1 generation Power7 server. Since Power servers scale almost linearly by frequency, the 8 core @ 4.22 GHz is 54,700. If we extrapolate that to a 4 socket or 32 cores we would be around 200K SAPS. That is quite a bit more than the 60 core Dell. Also, you could deploy a Power server as a standalone server. Nobody would deploy a mission critical workload on a standalone x86 server. Yes, I'm sure somebody will argue with me and say they do and have done it for years. Ok, but by and large we know they are always clustered and used to scale-out. Secondly, you claim how expensive the Power servers are. When was the last time you priced one Mr De Gelas? You can get a Power7+ 7R1, 7R2, or 7R4 that has price parity with a typical x86 price that includes Linux and VMware and comparably equipped. The 710 and 730 servers would be just a bit more but definitely competitive. Factor in the software savings and reduction in the number of servers required and the TCA and TCO will favor Power quickly. I do it all of the time and can back it up with hard data. You can run Power servers up to 90% utilization but rarely run x86 over 30%, maybe 35% tops.
With regard to @Brutalizer - Big claims of big servers, up to 96 TB of RAM. Who needs that? Who needs a server with 100's or 1000's of cores? The Oracle M6-32 has 1000 DIMMs to get 32 TB of memory. Tell us how this influences the MTBF of the server since the number of components is a major factor in the calculation. Next, you scoff at IBM for comparing to older servers. That is because they are talking to customers who are running older servers - consolidate those older servers onto a few or just 1 server that is inherently reliable - nothing more than a IBM mainframe followed by a IBM Power servers. Oracles M6-32 and M5-32 are just cables T5 servers scaled back from 16 to 12 cores. They have little RAS and built for marketing hype and to drive Oracle software licensing revenue. You say the Oracle M processor pricing is X and then try to paint picture that Power servers are more expensive for a 32 socket than a 8 socket - really. A v8 luxury car is more expensive than a 4 cyl econobox. The server price is moot when the real cost is the software you run on it. With Oracle EE + RAC at $70,500 + 22% annual maintenance per core it matters. On Power I only have to license the cores I need. If I need 2 cores for Oracle then I license 2 cores. On x86, the 15 core is 8. (15 x .5 = 7.5 rounds to 8). Oracle M series is also .5 so your 128 cores on SAP S&D to my 64 co Power7 at 1.0 puts us about equal. However, most customers don't run the servers with one workload. You will say your LDOMs is efficient but compared to Power Hypervisor it won't hold a candle to efficiently using the cores and threads - all of them in true multi-thread fashion. With Power8 coming out soon both Intel and Oracle will go back to smelling the fumes of Power servers. To customers out there. It isn't about being Ford or Chevy. This isn't college - don't root for your team even when they are no good. Your business has to not only survive but hopefully thrive. Do that on a platform that controls the largest cost which is software and Full Time Equivalents - that is Power servers.
Phil_Oracle - Monday, February 24, 2014 - link
Well I must say that this article is clearly Intel biased with a lot of misleading and downright wrong statements about Oracle and SPARC. Heres some accurate and substantiated counters:"Sun/Oracle's server CPUs have been lagging severely in performance"
This is wrong, considering that since the SPARC T4 release, and now SPARC T5 and SPARC M6 announcements, Oracle has announced 20+ world record benchmarks across *all* of the public, audited benchmarks from TPC-C, TPC-H @ 1TB, 3TB, 10TB to SPECjEnterprise2010 and SPECjbb2013. Many of them are still valid today, almost a year later.
What I'd like to ask, is where are the 8-socket Xeon E7 v2 benchmarks to compare to SPARC? Theres only one today - SAP. And this doesn’t demonstrate database performance nor java application performance.
Theres also no 4-socket or 8-socket benchmarks on TPC-C, TPC-H, SPECjEnterprise2010.
Even with SPECjbb2013, theres just a 4-socket result, and if you compare performance/core, the SPARC T5-2 @ 114,492 max-jOPS (just 32-cores) has a 1.3x performance/core advantage over the NEC Express5800/A040b with 60 x Intel E7-4890 v2 2.8 GHz cores @ 177,753 max-jOPS.
"As usual, the published benchmarks are very vague and are only available for the top models "
As of today, there is not a single real world application/database benchmark that shows Xeon having superior throughput, response times or even price/performance comparing systems with same # of CPUs to SPARC T5. You can go here to see all the comparisons with full transparency. https://blogs.oracle.com/BestPerf/
"and the best performing systems come with astronomic price tags ($950,000 for two servers, some networking, and storage... really?)."
You do realize you are linking to Oracle Exadata which isn't a server but an Engineered system with many servers, storage and networking all built-in and based on XEON??
Why are you not linking to SPARC T5 server pricing which is here since that’s what you are trying to discredit? Heres the SPARC T5-2 pricing which is very aggressively priced to x86 & IBM Power7+ systems.
https://shop.oracle.com/pls/ostore/f?p=dstore:5:90...
Or better yet, look at a public benchmark where full HW and SW pricing is disclosed?
A SPARC T5-4 is 2.4x faster than the 8-socket Xeon E7-4870 based HP DL980 G7 on TPC-H at 10TB.
The SPARC T5-4 server HW fully configured costs $268,853, HP DL 980 costs $268,431.
Basically same costs, SPARC T5 is 2.4x faster than Westmere-EX. Wheres Xeon E7 v2 to showcase its 2x faster??
Details of pricing and results are here.
http://www.tpc.org/tpch/results/tpch_result_detail...
http://www.tpc.org/results/individual_results/orac...
http://c970058.r58.cf2.rackcdn.com/individual_resu...
On TPC-C OLTP benchmark, a SPARC T5-8 has a $/perf of .55USD/tpmC, versus fastest Oracle x2-8 of .89 USD/tpmC and IBM x3850 of .59USD/tpmC. SPARC T5-8 is 70% faster per CPU than Westmere-EX based Oracle x2-8. http://www.tpc.org/tpcc/results/tpcc_results.asp?o...
Haravikk - Tuesday, February 25, 2014 - link
I wouldn't mind a 60 core Xeon in the next version of Apple's Mac Pro ;)Desert Dude - Thursday, April 3, 2014 - link
Interesting discussions. Just for clarification, there is an x86 server that goes beyond 8 sockets-bullion (Xeon E7 48xx up to 16sockets with near linear scale). Bull (legacy GE & Honeywell Mainframe) has leveraged technology used in its mainframe & HPC to build bullion...the world's FASTEST x86 server. bull.us/ bullion