The Bulldozer Aftermath: Delving Even Deeperby Johan De Gelas on May 30, 2012 1:15 AM EST
It has been months since AMD's Bulldozer architecture surprised the hardware enthusiast community with performance all over the place. The opinions vary wildly from “server benchmarks are here, and they're a catastrophe” to “Best Server Processor of 2011”. The least you can say is that the idiosyncrasies of AMD's latest CPU architecture have stirred up a lot of dust.
Now that the dust has settled, the Bulldozer chips now account for more than half of Opteron shipments and revenues. Since AMD's Financial Analyst Day (February 2, 2012), we have new code names: the improved Bulldozer architecture "Piledriver" will power the "Abu Dhabi" chip, a replacement for the current top server chip "Interlagos". AMD is clearly committed to the new "Bulldozer" direction: fitting as many cores as possible into a certain power envelope to improve thread throughput, while trying to "hold the line" on single-threaded performance.
In theory, the new 16-core Interlagos should have offered somewhere around a 33% boost in most highly-threaded applications. The reality is unfortunately not that rosy: in many highly-threaded server applications such as OLAP databases and virtualization, the new Opteron 6200 fails to impress and is only a few percent faster than it's older brother the 12-core Magny-Cours. There are even times where the older Opteron is faster.
Some, including sources inside AMD, have blamed Global Foundries for not delivering higher clocked SKUs. Sure, the clock speed targets for Interlagos were probably closer to 3GHz instead of 2.3GHz. But that does not explain why the extra integer cores do not deliver. We were promised up to 50% higher performance thanks to the 33% extra cores, but we got 20% at the most.
The combination of low single-threaded performance, the failure to really outperform the previous generation in highly-threaded applications, the relatively high power consumption at full load, and the fact that the CPU is designed for high clock speeds gives a lot of people a certain sense of Déjà vu: is this AMD's version of the Pentum 4 ?
One of our readers, "Iketh", spoke up and voiced the opinion of many of our readers:
" Unfortunately, the thought still in the back of my mind while reading was why did AMD reinvent the Pentium 4? I just don't get it."
Another reader nicknamed "Clagmaster" commented:
"A core this complex in my opinion has not been optimized to its fullest potential. Expect better performance when AMD introduces later steppings of this core with regard to power consumption and higher clock frequencies."
Although there have already been quite a few attempts to understand what Bulldozer is all about, we cannot help but not feel that many questions are still unanswered. Since this architecture is the foundation of AMD's server, workstation, and notebook future (Trinity is based on the improved Bulldozer core with the codename "Piledriver"), it is interesting enough to dig a little deeper. Did AMD take a wrong turn with this architecture? And if not, can the first implementation "Bulldozer" be fixed relatively easily?
We decided to delve deeper into the SAP and SPEC CPU2006 results, as well as profiling our own benchmarks. Using the profiling data and correlating it with what we know about AMD's Bulldozer and Intel's Sandy Bridge, we attempt to solve the puzzle.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Homeles - Wednesday, May 30, 2012 - linkThis. Read 3rd party reviews (like AnandTech!) -- several of them -- and draw your conclusions from there. That's pretty much the point of reviews; if marketing teams could provide honest, reliable benchmarks over a wide range of applications, we'd have little need for 3rd party reviews.
Mugur - Thursday, May 31, 2012 - linkWell... they actually did!
moravista - Wednesday, May 30, 2012 - linkGreat article Johan! I have been reading your articles since the Pentium III / K6-2 days and have really enjoyed them! Thanks for sharing your insight! Keep 'em coming!
JohanAnandtech - Friday, June 1, 2012 - linkGreat to hear from you. Did you used to participate at the different forums on a different callsign?
muy - Wednesday, May 30, 2012 - linki want a phenom II x4 980+ on 32 nm. this whole idea of "lets put as many crippled dual cores on a die and smack a level 3 cache on top and call it out next cpu" is utter crap stuff that doesn't multi thread well (95 % of all stuff).
6 core bulldozer i bought to replace my amd x3 450 is slower than the chip i wanted to replace at the same clock speed. now i have a shiny asus rog mb, a x3 450 powering it, and a 6 core bulldozer gathering dust. what a waste of money that was.
shame i can't find any x4 970+'s anymore and amd is to foolhardy to keep manufacturing their best gaming cpu's, let alone do a shrink on them to 32 nm.
i can only imagine how much better a phenom 2 x4 9xx, default clocked at 4.2 ghz+ would be than any bulldozer. (and how much cheaper to manufacture considering the die size compared to the die size of bulldozer).
i just don't understand amd.
Roland00Address - Wednesday, May 30, 2012 - linkMicrocenter has these following processors
1045t six core for $99
965 quad core black edition for $99
960t quad core black edition for $89 (this model is a disabled six core and has a possibility of unlocking to a 6 core. The 960t is a clearance processor so it is while supplies last.
fic2 - Thursday, May 31, 2012 - linkThose are all 45 nm. He is wanting a tick - a die shrunk Phenom II.
Would have to agree with him. If AMD would do a die shrink they would have a killer product - assuming GloFo didn't f*ck it up.
muy - Wednesday, May 30, 2012 - linkbulldozer doesn't do single threaded, highly branching (cough games cough) stuff well.
and before you say "some games use multiple cores", i'll say that 1 core running on 100 % and 7 cores at 5 % is not a good use of multi threading.
(1 * 100) + (7 * 5) = (1 * 100) + (1 * 35) - 1.35 cores used. this means that a DUAL core going at 10 % higher speed than the exampled 8 core would be 10 % faster than the 8 core 'using' it's 8 cores.
clock speed + ipc are the only things that matter 90% + of the time for games.
wolfman3k5 - Wednesday, May 30, 2012 - linkPeople don't buy CPUs based on theoretical performance, ideology or brand loyalty (OK, some fan-boys do). Most of us are not computer engineers, and even if we where, it wouldn't matter, because at the end of the day only the end result would matter: performance, efficiency and price. Just like I didn't buy Intel because it looked good on paper back in the glory days of AMD (cca. 2005). So no matter how deep and involved these articles are, AMD still trails Intel when it comes to performance, and it will do so until their lazy and incompetent CPU engineers will get off their lazy buts and start working. The sole reason why Bulldozer was such a massive fail was because most of the design process was highly automated. So, stop slacking and start working lazy AMD engineers!
Homeles - Wednesday, May 30, 2012 - linkBeing a "lazy" electrical engineer is practically impossible. The amount of work that has to go into making these processors simply function is quite massive. These guys work hard to get to where they are with their careers and work even harder to keep those careers. The margin of error here is also quite huge... a small flaw can create enormous performance penalties.
I'd be willing to bet that many, if not most of Bulldozer's shortcomings could be blamed on management. Saying it was "lazy engineers" is callous and ignorant.