The Fiji GPU: Go Big or Go Home

Now that we’ve had a chance to take a look at the architecture backing Fiji, let’s talk about the Fiji GPU itself.

Fiji’s inclusion of High Bandwidth Memory (HBM) technology complicates the picture somewhat when talking about GPUs. Whereas past GPUs were defined by the GPU die itself and then the organic substrate package it sits on, the inclusion of HBM requires a third layer, the silicon interposer. The job of the interposer is to sit between the package and the GPU, serving as the layer that connects the on-package HBM memory stacks with the GPU. Essentially a very large chip without any expensive logic on it, the silicon interposer allows for finer, denser signal routing than organic packaging is capable of, making the ultra-wide 4096-bit HBM bus viable for the first time.

We’ll get to HBM in detail in a bit, but it’s important to call out the impact of HBM and the interposer early, since they have a distinct impact on how Fiji was designed and what its capabilities are.

As for Fiji itself, Fiji is unlike any GPU built before by AMD, and not only due to the use of HBM. More than anything else, it’s simply huge, 596mm2 to be precise. As we mentioned in our introduction, AMD has traditionally shied away from big chips, even after the “small die” era ended, and for good reason. Big chips are expensive to develop, expensive to produce, take longer to develop, and yield worse than small chips (this being especially the case early-on for 40nm). Altogether they’re riskier than smaller chips, and while there are times where they are necessary, AMD has never reached this point until now.

The end result is that for the first time since the unified shader era began, AMD has gone toe-to-toe with NVIDIA on die size. Fiji’s 596mm2 die size is just 5mm2 (<1%) smaller than NVIDIA’s GM200, and more notably still hits TSMC’s 28nm reticle limit. TSMC can’t build chips any bigger than this; Fiji is as big a chip as AMD can order.

AMD Big GPUs
  Die Size Native FP64 Rate
Fiji (GCN 1.2) 596mm2 1/16
Hawaii (GCN 1.1) 438mm2 1/2
Tahiti (GCN 1.0) 352mm2 1/4
Cayman (VLIW4) 389mm2 1/4
Cypress (VLIW5) 334mm2 1/5
RV790 (VLIW5) 282mm2 N/A

Looking at Fiji relative to AMD’s other big GPUs, it becomes very clear very quickly just how significant this change is for AMD. When Hawaii was released in 2013 at 438mm2, it was already AMD’s biggest GPU ever for its time. And yet Fiji dwarfs it, coming in at 158mm2 (36%) larger. The fact that Fiji comes at the latter-half of the 28nm process’s life time means that such a large GPU is not nearly as risky now as it would have been in 2011/2012 (NVIDIA surely took some licks internally on GK110), but still, nothing else we can show you today can really sell the significance of Fiji to AMD as much as the die size can.

And the fun doesn’t stop there. Along with producing the biggest die they could, AMD has also more or less gone the direction of NVIDIA and Maxwell in the case of Fiji, building what is unambiguously the most gaming/FP32-centric GPU the company could build. With GCN supporting power-of-two FP64 rates between 1/2 and 1/16, AMD has gone for the bare minimum in FP64 performance that their architecture allows, leading to a 1/16 FP64 rate on Fiji. This is a significant departure from Hawaii, which implemented native support for ½ rate, and on consumer parts offered a handicapped 1/8 rate. Fiji will not be a FP64 powerhouse – its 4GB of VRAM is already perhaps too large of a handicap for the HPC market – so instead we get AMD’s best FP32 GPU going against NVIDIA’s best FP32 GPU.

AMD’s final ace up their sleeve on die size is HBM. Along with HBM’s bandwidth and power benefits, HBM is also much simpler to implement, requiring less GPU space for PHYs than GDDR5 does. This is in part due to the fact that HBM stacks have their own logic layer, distributing some of the logic on to each stack, and furthermore a benefit of the fact that the signaling logic that remains doesn’t have to be nearly as complex since the frequencies are so much lower. 4096-bits of HBM PHYs still takes up a fair bit of space – though AMD won’t tell us how much – but it’s notably lower than the amount of space AMD was losing to Hawaii’s GDDR5 memory controllers.

The end result is that not only has AMD built their biggest GPU ever, but they have done virtually everything they can to maximize the amount of die space they get to allocate to FP32 and rendering resources. Simply put, AMD has never reached so high and aimed for parity with NVIDIA in this manner.

Ultimately this puts Fiji’s transistor count at 8.9 billion transistors, even more than the 8 billion transistors found in NVIDIA’s GM200, and, as expected, significantly more than Hawaii’s 6.2 billion. Interestingly enough, on a relative basis this is almost exactly the same increase we saw with Hawaii; Fiji packs in 43.5% more transistors than Hawaii, and Hawaii packed in 43.9% more transistors than Tahiti. So going by transistors alone, Fiji is very much to Hawaii what Hawaii was to Tahiti.

Finally, as large as the Fiji GPU is, the silicon interposer it sits on is even larger. The interposer measures 1011mm2, nearly twice the size of Fiji. Since Fiji and its HBM stacks need to fit on top of it, the interposer must be very large to do its job, and in the process it pushes its own limits. The actual interposer die is believed to exceed the reticle limit of the 65nm process AMD is using to have it built, and as a result the interposer is carefully constructed so that only the areas that need connectivity receive metal layers. This allows AMD to put down such a large interposer without actually needing a fab capable of reaching such a large reticle limit.

What’s interesting from a design perspective is that the interposer and everything on it is essentially the heart and soul of the GPU. There is plenty of power regulation circuitry on the organic package and even more on the board itself, but within the 1011mm2 floorplan of the interposer, all of Fiji’s logic and memory is located. By mobile standards it’s very nearly an SoC in and of itself; it needs little more than external power and I/O to operate.

Fiji’s Architecture: The Grandest of GCN 1.2 Fiji’s Layout
Comments Locked

458 Comments

View All Comments

  • Ryan Smith - Thursday, July 2, 2015 - link

    The current tools do not report VRM temperatures for the card (AFAIK). I've taken an IR thermometer to the card as well, though there's nothing terribly interesting to report there.
  • guld82 - Thursday, July 2, 2015 - link

    Civilization: Beyond Earth
    The bigger advantage of Mantle is really the minimum framerates, and here the R9 Fury X soars. At 1440p the R9 Fury X delivers a minimum framerate of 50.5fps
    1440p should be changed to 4k
  • jeffrey - Thursday, July 2, 2015 - link

    Ryan Smith, any update on GTX 960
  • Ryan Smith - Thursday, July 2, 2015 - link

    As soon as Fury is out of the way.
  • chizow - Thursday, July 2, 2015 - link

    Fair review Ryan, unfortunately for AMD Fury X will go down as an underwhelming product that failed to meet the overhyped build up from AMD and their fans. Its not a terrible product by itself, as it does perform quite well, but it simply didn't live up to its billing, much of which came directly from AMD themselves when they made very public claims like:

    1) HBM enables them to make the World's Fastest GPU. This didn't happen.
    2) Easily beats the 980Ti, based on their internal benchmarks. This didn't happen either.
    3) Fury X is an Overclocker's Dream. We've seen anything but this being the case.
    4) Water Cooling allows this to be a cool and quiet part. Except that pump whine, that AMD said was fixed in shipping samples, but wasn't.
    5) 4GB is enough. Doesn't look like it, especially at the resolutions and settings a card like this is expected to run.

    Add to that the very limited supply at launch and Fury X launch will ultimately be viewed as a flop. I just don't know where AMD is going to go from here. R9 300 Rebrandeon happened (told you AMD fanboys this months ago) and those parts still aren't selling. R9 Fury X while still AMD's best performing part is still 3rd fastest overall at the same price point as the faster 980Ti, and in extremely limited supplies. Will this be enough to sustain AMD into 2016 where the hopes of Zen and Arctic Islands turning around their fortunes loom on the horizon, we'll see, but until then it will be a bumpy road for AMD with some cloudy skies on the horizon!
  • Thatguy97 - Thursday, July 2, 2015 - link

    i fear this
  • Stuka87 - Thursday, July 2, 2015 - link

    The pump whine was fixed. Only very early cards have the old pump, later cards do not. And even with the louder pump, its STILL quieter than a reference 980Ti.
  • chizow - Thursday, July 2, 2015 - link

    Even if that is the case, that's not what AMD was telling the press when it was brought to their attention during the review phase. Obviously it would be difficult, if not impossible for AMD to correct the problem in shipping samples given how rushed they were just getting review samples out to the press.

    AMD was dishonest about the pump issue plain and simple, and just hope the pump whine falls below any individual's noise tolerance thresholds.

    As for comparisons to 980Ti, the Fury X will certainly be quieter in terms of pure dB under load, but the noise profile of that pump whine is going to be far more disturbing at any other point in time.
  • mapesdhs - Friday, July 3, 2015 - link

    Beats me why nobody makes more of the practicality issues of trying to fit such a card in a case which in all likelyhood (for this class of GPU) _already has_ a water cooler for the CPU, and don't get me started on how one is supposed to cram in two of these things for CF (not that I'd bother given the state of the drivers; any DX9 fix yet? It's been over a year).

    Without a clear performance advantage, which it doesn't have, it needed to be usefully cheaper, which it's not. Add in the lesser VRAM and no HDMI 2.0 and IMO AMD has blown this one. it wouldn't be so bad except it was AMD that chucked out much of the prelaunch hype. Other sites have differences to the 980 Ti a lot more than 10% at 1440 (less so at 4K of course, though with only 4GB and CF build issues I don't see 4K as being a fit for this card anyway). Factory oc'd 980 Tis are only a little more, but significantly quicker even at 4K.
  • chizow - Sunday, July 5, 2015 - link

    Yeah, Fury X is not really a smaller form factor, its just different. Fitting that double thick rad is going to pose a much bigger problem for most cases vs. a full sized 9.5" blower, given Nvidia's NVTTM reference fits most any mini-ITX case that can take 2 slots.

    As for Fury X price and perf, I think the 980Ti preemptively cut into AMD's plans and they just didn't want to take another cut in price when they had their sights set on that $800+ range. But yeah Fury X and by proxy, Fury Air and Fury Nano will be extremely vulnerable at 1080p and 1440p given they will be slower than Fury X which already has slower and last-gen cards like the 290X/390X/780Ti and GTX 980 on its heels.

    I don't think AMD could've afforded more price compression or there's simply no spots that make any sense for Fury Air and Fury Nano, which again goes to my point they should've just launched these parts as the top end of their new R9 300 series stack instead of Rebrandeon + Fury strategy.

Log in

Don't have an account? Sign up now