Comments Locked

179 Comments

Back to Article

  • shabby - Wednesday, May 10, 2017 - link

    Slow down nvidia, give amd a chance to catch up...
  • ddriver - Wednesday, May 10, 2017 - link

    Why, amd doesn't compete in that particular market?
  • shabby - Wednesday, May 10, 2017 - link

    No they don't but its a new architecture that will eventually trickle down to consumers, amd doesn't even have anything to compete with nvidias last gpu.
  • ddriver - Thursday, May 11, 2017 - link

    This ain't gonna "trickle down to consumers", not ever. It is a completely different product, for a completely different market. It is not technologically ahead of the consumer GPUs, it is the same architecture, simply not crippled. And nvidia will always cripple consumer products, so they don't eat into its way higher margin markets. This product, in its crippled form is already available to consumers.

    AMD could easily make something similar if they weren't always in the red. AMD is NOT technologically inferior to nvidia, it is just poor and doesn't have the resources to bring out its new arhcitectures and modifications in time. Had AMD had more money, they'd launch Vega a year ago, and would have been perfectly competitive with nvidia's 1000 line. AMD has been forced into an underdog position by years of illegal business practices and also from within through bad management. Which is why they don't have enough money to push their arch in time, which is why they don't have the money to develop such big and yield-unfriendly dies, because they don't have the neither the margins nor the market to sell such a thing.
  • fallaha56 - Thursday, May 11, 2017 - link

    we'll let you stew on how long it takes you to realise that today's data centre Tensor is tomorrow's game AI processor...

    apart from bandwidth and FP64 there really isn't any difference in nVidia's products

    (and i'm a big AMD fan)
  • ddriver - Thursday, May 11, 2017 - link

    Sorry to break it to you, but its just a fad. It won't last. Like stereoscopic gaming. Like VR which is already dying out. Machine learning is not AI, it is not even machine learning, it is machine training. And in the context of games that won't happen during the actual gameplay, but will be "trained in" well in advance. And I hate to say it, but between back when hardware was 100 times slower and now I haven't seen any tangible improvements in game "AI". It has always been lame, and it has nothing to do with computational budget.

    I am still waiting on them hover cars they promised we'd have about 20 years ago or so.
  • T1beriu - Thursday, May 11, 2017 - link

    VR is a fad? Wow.
  • Mark_Hughes - Friday, May 12, 2017 - link

    Back in the early 90's when I was close to leaving school I was asked what job I would like to do. I said I wanted to be a computer programmer. The careers advice guy said to me "Computers are just a fad, They have no real future" Incredible to think that attitude still exists today with things...

    I eventually became a programmer after being a mechanic for years. I work on 3D CAD software and I can say for a fact that VR is not a "fad" for our customers.

    Wow indeed.
  • sfg - Tuesday, August 15, 2017 - link

    Thanks for the anecdote. However if you were really alive in the early 90s you must know that VR was already a fad several times. Can't blame people for not trusting it. The comparison with computers in general is just a strawman argument and really stupid.
  • AlphaBlaster - Sunday, May 14, 2017 - link

    Hello World!! Today is a nice day as this is officially my first post on anandtech and since that is the case, it is more of experimental nature, it might be interpreted as a little bit de-contextualized.
  • WinterCharm - Wednesday, May 17, 2017 - link

    He's just a negative Nancy. Apparently, 4K is a fad too. And so Is directX
  • sfg - Tuesday, August 15, 2017 - link

    Well, and you're just a Dumb Allison. Just so you know, you're not smarter just because you're optimistic about everything. And if someone is disagreeing with your point of view you might want to consider that simply calling him "negative" doesn't suddenly make your belief the correct one. What is "negative" even supposed to mean? That he doesn't believe everything he's told? Yeah, I can see how that's bad and you are actually right. Sure.
  • sfg - Tuesday, August 15, 2017 - link

    Well, T1 Beriu, I'm sure you have some deep insight on why VR will do better now than in the previous attempts, right? I mean, you know about those, right? You're not just a 15 year old who got his panties wet by some VR demo and thinks this is the future of everything, right?
  • Strunf - Thursday, May 11, 2017 - link

    Call it has you want but AI that can adapt to player game style is the future... just like the robot learned how to play hockey and score one could imagine an IA that would learn from the way you play and become far more effective against you, there's no way to plan this ahead of time cause every player plays differently and changes of strategy depending on circumstances and on a whim. The IA also controls many different enemies but one could imagine different IA controlling different characters to give it a more human feeling.

    It hasn't been much improvement cause it's hard to implement and game specific but an open source IA that would learn by itself could be easily implemented on all games a bit like DeepMind learned to play games.
  • gamerk2 - Thursday, May 11, 2017 - link

    The problem with AI is it gets computationally expensive in a hurry. You'd need a dedicated GPU specifically for it if you want to go much beyond the basic behaviors that have been rehashed over the past decade or so (See player->Kill player).

    FEAR had good (even great) AI, and when you listen to the devs talk about it (The IEEE had an article on it a few months back), you see it was a well thought out development process. But for most games, AI is an afterthought.
  • melgross - Thursday, May 11, 2017 - link

    What is needed, and will come, is an AI engine as we see for game gevelopment now. Once that's standardized, as it will be, then developers can leave much of the AI to the AI engine (software) and do the rest of the he work. That engine will run on the AI compute GPU.

    You know it's coming. It's just a matter of when.
  • tisch - Thursday, May 11, 2017 - link

    Can you provide a link to the Article in IEEE, or name the IEEE journal name and date for it. I can't seem to be able to find it.
  • melgross - Thursday, May 11, 2017 - link

    VR is not fading out. It's just beginning. If you're thinking of the stupid Glass from Google, that's one thing, and it's not VR, but AR.

    VR will become major once prices come down to the price of an accessory, which will be at least. Couple of years from now, and possibly three, and performance goes up enough to eliminate all of the lag, which is one reason people get sick using it. That will also take another two to three years.

    It's really silly comparing this to hover cars, which is itself a silly idea.
  • michael2k - Thursday, May 11, 2017 - link

    You would claim we won't have on the fly AI, only pretrained AI?

    Is the like how Myst was prerendered 3D and how no one renders 3D graphics dynamically?

    Oh, wait, that's NVIDIA 's entire value add because prerendered graphics wasn't good enough. I imagine in a decade that AI that continues to train will be the norm.
  • AlphaBlaster - Sunday, May 14, 2017 - link

    Dear ddriver,

    Your above post is one of the most insightful, realistic and farsighted contributions you made over the last while. Each and everything stated there is completely true and a big cause for concern. In a world of finite resources with global ecosystems degradation caused by human overpopulation, overconsumption, the capitalists fanatism for endless growth, climate change has a high probability of causing the conditions which lead to a nuclear war. As soon as it becomes possible technological progress is going to be used to design, develop and manufacture autonomously operating killer robots which will take care of the deplorables, the lesser beings on this earth. As everyone is free to witness by visiting any large computer, gaming or smartphone related site, most of the users there live in their own little bubbles of ignorance, stupidity and denial, paired with displaced believe in technological fixes for which the laws of physics provide no basis. Their natural reaction when someone points out the flaws in their para-logic and reasoning is met with anger, disgust, disbelieve and furious attempts to shoot the messenger. These people, while not the primary cause, constitute a huge part of the ever evolving and accelerating worldwide degradation of the eco- and -biosphere in a multitude of ways. Most things the consumer electronics industry produces today, are resource intensive, ecologically toxic products which contaminate the biosphere and whose planned obsolescence only serves one purpose: to make money. As this is now the de facto standard worldwide in most of the consumer oriented industries, there is a huge potential for deterioration. Furthermore most of the people here are meat eaters, thus animal holocaust profiteers. There aren't that many activities which one is able to engage in which are more devastating than unnecessary meat consumption. Most of the worlds crops are used to feed livestock which is incredibly resource itensive, requiring farmland where once forests were, fertilizers, huge amounts of fossil fuels to produce them and to power the machines necessary to harvest and process the crops, water and fossil fuels to fatten the livestock just to be murdered with purpose of being fed to ignorant lazy faggots. There is no other industry more abhorrent and outright evil than the meat industry which lets the holocaust and all what the Nazis did look like child play in comparison. Each and everyone can start today with making the world a better place by becoming a vegan. Tofu tastes good and is healthy; it is readily available in various shapes and forms.
    The world needs more bright and far sighted people like you, thus please don't let yourself be discouraged by the bunch of lazy faggots who have the audacity to call you a troll, you aren't one and please continue to share your enlightening contributions wherever you deem them to be appropriate.

    Thank you very much!!
  • theuglyman0war - Sunday, May 14, 2017 - link

    The only gaming I have done is stereoscopic since 2010 and I won't but a monitor that doesn't support 3Dvision. Between Helix 3DMigato and WSGF/Hayden's Flawless 3d Surround community solutions... I usually have no problems. The industry and naysayers confuse saturation with obsolescence. Stereoscopic markets between Theater VR and Gaming will be a multi Billion dollar industry as long as we have two eyes.
  • CiccioB - Thursday, May 11, 2017 - link

    Sorry to disturb your dreams, but AMD has inferior technology with respect to nvidia.
    I do not knwo how you calculate "not being inferior" but nvidias technology allows to use less energy to do the same work on GPGPU and also less silicon for 3D pipeline.
    Combined nvidia offers much better products that AMD can counte attack only producing much beefed and energy hungry GPUs making their upper line match with nvidia lower line.
    Margins are then quite different.
    And those margins are those that make AMD "poor". This has been for year now. GCN has annihilated any AMD advantage (with Terascale at least they could make smaller chips).
    Now they are behind under all point of view: size, energy and performance.
    How can you say AMD is not inferior is just a mere symptom of being blinded by red glasses.

    Again sorry, but reality is different.
    BTW, this is a beast that has not a single competition from any other HPC card producer. Vega won't touch the heels of this GPU. It just means nvidia needs to cut this down to bare minimum as they did with GP102 and have a winning GPU that is going to crush Vega without having to use expensive HBM2 (another reason of different margins.. one uses optimized technology that makes maximum use of bandwidth, the other just feeds as much bandwidth as possible with gargantuan costs).

    I may say that when nvidia will release Volta it will be a complete generation ahead of AMD which will have something to fight back only with Navi, a year later. Possibly, too few too late.
  • vladx - Thursday, May 11, 2017 - link

    Damn CiccioB, you're a savage
  • ddriver - Thursday, May 11, 2017 - link

    Nvidia has enough money to produce different chip versions of the same family - amd does not.

    Nvidia cuts everything that is non essential to games to produce a castrated chip, which obviously, will consume less power, since it is much less capable. They have been cutting literally every corner, which should be obvious even to a fanboy like you, considering that their legacy title performance advantage is not reflected into DX12 and Vulkan titles, because the hardware isn't really "next gen" capable for the sake of transistor savings.

    Amd meanwhile is forced to use the same chip for both mainstream and professional products, thus the chip has to have some redundancies, which hurt its power efficiency in tasks which do not call for such features, because they don't have the resources to produce numerous flavors of every chip. Which is why amd is also forced to make more future proof designs, that actually have performance benefits from running DX12 and Vulkan titles rather than the performance degradation exhibited by nvidia's GPUs.

    Amd is also making this transition, however due to the shortage of funds, far more slowly. They now have different chip flavors for their high end workstation GPUs, whereas couple of years back it was still the same chip, just different drivers. Which is why Amd GPUS had tremendously better FP64 rates, like 1/2 or 1/3 of FP32, whereas nvidia has gutted that a long time ago, and it should go without saying, but I'll say it nonetheless, since it doesn't look like reasoning is not your strong point, if you sack FP64 throughput to 1/32 you will obviously save on some power.

    Another big power saver for nvidia is tile based rendering, the patent of which expired recently. Amd will also incorporate it in Vega, but again, due to the lack of funds, it took much longer to implement than it did nvidia.

    Amd can easily scale its designs up to match that thing, had it had the resources to make it and market to sell it, but they don't have either. Nvidia's advantage is entirely logistical. Their technological superiority only exists in the wet dreams of fanboys like you.

    Amd being "poor" doesn't have anything to do with the quality of its products, amd was poor even when its products had a pronounced edge over the competition

    "How can you say AMD is not inferior is just a mere symptom of being blinded by red glasses."

    Nah, I just understand the technology, the industry and the market, unlike you, who only understands marketing hype.
  • Strunf - Thursday, May 11, 2017 - link

    AMD doesn't have the money to make different flavors of the same chip? what you can the RX 480, RX470 and RX460 ? and now the RX580, 570 and 560. Different flavors cost little to nothing, easy peasy to disable parts of a chip, tapping out optimized versions costs more on R&D but also gives bigger returns (less waste). And what about the FURY thing sure no money for some stuff but to spend money on experimental HBM products they seem to have...

    Tile based rendering a patent? it's over 20 years old technology, and why would it be worthy of a patent it's just a division of the image into smaller blocks.

    Well being poor didn't stop AMD from buying ATI... AMD has money (big investors behind) they just made bad choices.
  • fanofanand - Thursday, May 11, 2017 - link

    Your lack of understanding is stunning! By different chips, he means compute vs graphics. If tile based rendering sucked then why is Nvidia now using it (now that the patent has expired). The 480/470/460 are NOT different chips, they are the same chip with some of the SM's disabled. Just stop commenting and start reading. Learn a bit more before you start spouting off. You made me defend ddriver which is not something I'm comfortable with.
  • CiccioB - Thursday, May 11, 2017 - link

    You should learn as well. Fiji was a complete new architecture with respect to previous GCN based GPUs, a project that costed a lot of money. The simple justification of that wasted amount of money was the use of new HBM memory that did not brought any advantage.
    BTW, Fiji was a compute chip on the contrary, that is AMD decided to create a big chip only for the consumer market (no DP capacity). So they have made distinction between consumer and computing market, but they get it wrong because they prefer to follow other priorities (trying to get at least a chip ahead of the competition after so many years behind). They failed nonetheless.
    Another bunch of money thrown away when they could have been used more cleverly for optimizing Polaris, which could have been more efficient and so they could also make a larger chip to really fight against GP104.

    The situation of the actual AMD architecture is not happened by chance suddenly in a night. It is the evolution of a series of choices AMD has brought forward during these years. If the situation is that they cannot afford to have two really different architecture is their responsability.
    They started thinking that combining FP32 ALU to get FP64 one was a clever move to save die space. But that had other drawbacks that they thought were not so important.
    Unfortunately they are (mainly efficiency). Now it is not an nvidia problem if they decided to make much more efficient architectures by sacrificing die space on computing GPUs (that however are sold with premium margins that it does not matter) and so being able to offer better products on both markets.
    AMD will arrive at the same strategy (it is already there to tell you the truth), but as often has happened in last 10 years, it has arrived at the same competition conclusion after that it has see it applied by others and a couple of years late.
  • mat9v - Thursday, May 11, 2017 - link

    AMD did not fail with Fiji, it works great as pro chip, it's just that they made it with not enough ROPs for the amount of SP they included. Yes it was a serious error in the project, it does not mean that whole archi was a bust. ROPs after all are not used in professional computations. Fiji also had less then stellar amount of memory. At the time of Fiji premiere there were not "years behind Nvidia", at most 1 generation and Fury X barely lost to 980Ti while winning with standard 980. In fact 980Ti was a reaction from Nvidia to the Fury X premiere. Granted Fury was far less efficient chip then 980 family and OC was a bust there.
    Yup, you are right, Nvidia has much more money (valuation of AMD is about 10B $ while Nvidia is about 70B $) and is in the green so they can afford to create different designs for different markets, it's hard to tell how the design and production costs impact margins but I guess they have enough volume to justify that. It is sad day when on Amazon, first 15 places on "most units sold/popular" list occupied by Nvidia cards :( Hopefully Vega will make some inroads into client's hearts :)
  • CiccioB - Friday, May 12, 2017 - link

    How can you say Fiji was not a fail?
    It could not satisfy a single point of its original intentions.
    1. Win against GM200 no matter the cost:it failed despite super expensive HBM
    2. Being used in professional card: it failed, 4GB of RAM is just 4 years old capacity
    3. Make AMD gain some money: it failed, its cost was so high and the clever move of nvidia to place GTX980Ti to a quite low price (lower than it was really needed) just killed any AMD margin they could think they could have made on it
    4. Make AMD gain experience with HBM for having an advantage with the next generation: failed, AMD did not use HBM; in any form while nvidia started using HBM2 as soon as it was available. A year and half later AMD has still nit launched a card with HBM2 memory.

    What are the winning points of Fiji? Working great as a Pro chip? Where? It was even not put into a professional class card but that Pro Duo that was created just to dispose of the last Fiji chips that otherwise had to be sold for the price of a beer. And they probably sold ten of them on the entire planet.
    Fiji has been the most disappointing GPU after R600. So many costs, so many expectations, so little returned. Saying that it did great against 980 is like saying that Hawaii is a great GPU because it manage to win against a 750Ti. You can compare it what whatever you want and be happy it is faster that it, but the lower the comparison the lower the value of the GPU.
  • CiccioB - Thursday, May 11, 2017 - link

    With this you've just thrown the mask and show the small shallow fanboy in you.

    For the FP64 bit blabbing, I would just recall that both Tonga and Fiji where highly castrated under this point of view and yet Fiji + HBM could not reach GM200 performances but in few optimized situations at 4K (and had really great problems at lower resolution that even Hawaii sometimes was near to its performances) while Tonga (aka 380) was a monster (in dimensions and power consumption) against the much smaller and super efficient GM106 (aka GTX960).
    Last but not least, Polaris has not FP64 capacity better than consumer Pascal and yet the 580 uses a twice energy hungry and bigger die (using a denser PP nonetheless) than the 1060 it tries to beat.

    The fact that AMD has always tried to sell FP64 capabilities in consumer market (since Terascale) its all their fault, not a problem of money. It was a wrong strategy and AMD seems to have understood how stupid this was.
    However it just allows them to spare few watts with respect to the immense inefficiency their architecture has.

    About the DX12 performances, what are you talking about with "future proof design"? The one that gains 10% of performances using double the energy? 30% more silicon? 30% more TFLOPS to do the same work?
    Are you implying that using all this extra resources to gain 10% of performances in those (few) optimized games for AMD architecture (requiring further work payed by AMD to support its "advanced tech") is a "future proof design"?
    Future for when? 10 years after their release? We are beyond 2 years after DX12 launch, in a market dominated by console porting, running on AMD DX12 capable HW... still nothing really that makes DX12 overcome DX11 (ah, the low level promises of much metter performances in exchange of a much harder and costly work for the programmers).
    Unfortunately for you, that still look at the reality though red glasses, in DX12 nvidia games are not let in dust like your dreams make you think. They are quite competitive. It does not require nvidia to revolutionize their architecture to gain that 10% difference. Keeping their GPUs always smaller and cheaper than the concurrent ones. And new DX11 games are stil constantly released making "the advanced AMD architecture" ridiculous.

    Being stronger using more resource is not an indication of having "better technology", my dear.
    On the contrary. Running an entire F1 grand prix with a Prius and arriving 1 meter head (that's enough to win) of a McLaren doesn't indicates that the two cars have the same level technology. Don't you agree? And even in the few occasions when the McLaren wins by 1 meter you can't say "huuuhuu.. what a fantastic car future proof.. we will use it for the next year as well!"

    AMD is well being nvidia current architecture. You may believe what you want on FP64 (which I proved you just do not know what you are talking about), you just admitted that nvidia rasterization is superior (and AMD is going to copy it a couple of generation later like it did with nmuch of nvidia architecture when passing from Terascale to GCN), nvidia memory bandwidth usage is better (and that is witnessed by same performances GPU like GP106 and Polaris 10, or that is the same that GP104 with same bandwidth can deliver 50% more performances), you can babble whatever you have heard in some fanboy club about this useless (performance wise) asynchronous technique that just try to lower AMD architecture inefficiencies. You know Fiji is a 8.9TFLOPS GPU and GM200 5.6TFLOPS GPU, do you? Or that Polaris 10 is a 5.5TFLOPS GPU and GP106 just 4.3TFLOPS? WOW async allows this one to have 10% advantage in Vulkan based games! And 6% in DX12 games that heavily use async, that is only the 4 ones that are based on DICE engines that have been payed by AMD just to exploit all AMD architecture capacity! 10% more! With double the power usage (see 580 hyper OC 240W to have this advantage)! Incredible! How better is AMD architecture, isn't it?

    Come on, you can do better than this. Just wake up and stop believing in your wet dreams where AMD is something that in reality it is not. You may have been sleeping too much to be aware that nvidia cards using smaller and cheaper components are sold for a higher price. Maybe because nvidia cards are better that AMD ones that have to be under priced to be sold in a decent number?
    The poor AMD is poor because it has been selling products with ridiculous margins for years because their technology was sub par. It is doing the same today. It seems it won't be different with Vega, and you dare to say that AMD is not technologically ownd by AMD?
    PUT OF THOSE RED GLASSES, PAL! Oy you won't understand anything of what is coming next as much as you have not understood why "the poor" AMD is in this position today.
  • medi02 - Thursday, May 11, 2017 - link

    Oh, chizow is back...
  • Ranger1065 - Friday, May 12, 2017 - link

    Please God no more Chizoo. I'm with ddriver 100%, your comments usually put the peasants in their place.
  • K_Space - Monday, May 15, 2017 - link

    LMAO! I sniffed something vaguely familiar about that long rant....
  • Meteor2 - Thursday, May 11, 2017 - link

    TL;DR.

    Do you think people's time is of such little value to them that they'll read such a long rambling comment?
  • CiccioB - Thursday, May 11, 2017 - link

    You are free to read the comments that just state "W AMD" "W nvidia" "AMD is great" "nvidia is greater"
  • mat9v - Thursday, May 11, 2017 - link

    Yes, you are right that Fiji is a bust in games, not so in professional tasks. You also know that most of the savings in Maxwell and Pascal cards come from Tile Rendering? And consequently from increased clocks. It seems that it will be much different in Vega as it adopted similar (though differently named pf course) technology, hopefully it will give it the same advantage. Remember that professional tasks are the worst case scenario for the card in the power consumption domain, TR should, again should make it much more efficient when playing games.
    You are so hanged on FP64 as if it was the end all be all of professional computing while in deep learning tasks the FP16 data format is used the most. https://devblogs.nvidia.com/parallelforall/inside-...
    You are right, Polaris is inefficient, but it was and is a stopgap measure until AMD was able to finish Vega, why do you think AMD never even tried to build a chip on Polaris archi with 4096 SP with lower clocks? It would be too large and too inefficient though it may even be able to compete with 1080 - but at what costs? It may be even worse then Fury when scaled like that.
    We will see if AMD jump to TR brings a real advantage over earlier designs.
  • CiccioB - Friday, May 12, 2017 - link

    I'm not hanged on FP64 at all. There was a statement that lack of FP64 was the reason for nvidia to be so efficient. I just demonstrated that it was not true.
    Now you come and say that the real reason is Tile Rendering and high clock. High clock usually never make efficiency better. Usually the best strategy is using more silicon with lower clock just as AMD does. However, for how really bad is AMD architecture, it can't compete even with bigger dies and lower clocks.
    nvidia makes fast shader since G80 (where they origiannly run 3 times the uncore clock, then double with Fermi). It changed that with Kepler (tripling the number of shaders) and with Maxwell it raised the entire GPU clock even higher decreasing power consumption.
    Tile based rendering is only a part of the optimization that nvidia has developed in years to make their architecture consume less energy. cache, bandwidth are other optimization that AMD just lately started to adopt (let's even say copy). It a sum of many things the capacity of nvidia architecture of being more efficient both in terms of power consumption and of computation (calculated as theoretical maximum TFLOPS vs time to complete a work).
    AMD is year back with respect to nvidia in GPGPU optimizations as nvidia started 5 years before with GPGPU (hitting its own dose of problems and costs) while AMD was enjoying creating games only GPUs with Terascale architecture. When they passed to real GPGPU those 5 years where all visible and today it has not managed to close the gap. The road to that is still quite long.

    Again, speaking of monstrous GPU to be put against lilliput ones is no sense. You can't create a Ferrari class car and say that you will price it to compete with a Prius. It's useless even though your Ferrari is 5mph faster that the Prius. Unless you are able to create Ferrari class car at the cost of a Prius. Which does not seem to be an AMD skill.
  • nikon133 - Sunday, May 14, 2017 - link

    I have partially lost you at "Prius winning McLaren" part. Or losing by a meter. In reality, McLaren would lap Prius every 3 laps or so. Even with crappy Honda engine (if we are talking McLaren F1 car).

    What are you saying...?
  • helvete - Thursday, August 3, 2017 - link

    exactly
  • melgross - Thursday, May 11, 2017 - link

    Nah, you actually don't understand this technology. That's pretty obvious.
  • vladx - Thursday, May 11, 2017 - link

    Don't mind ddriver, he's just on one of his usual daily rants
  • medi02 - Thursday, May 11, 2017 - link

    It's called node process and it has nothing to do with nVidia.
  • mat9v - Thursday, May 11, 2017 - link

    Aren't you forgetting about Instinct cards from AMD? Same target audience, 12.5Tflops performance from 300W of power, available now for more than 2 months (Premiered at GDC2017 if I remember correctly). It was no paper launch as those cards are already working in, for example, LiquidSky. They may be a bit less energy efficient but they are also here and according to Nvidia those Tesla cards will be available to buyers at the end of 3rd quarter - wouldn't you all agree that it is a strange launch for Nvidia?
    As much as Tesla V100 is a great card, and despite all internal changes alluded to by Ryan, we have about 40% increase in performance for about 40% increase in SP count with almost constant core clock and power use. That suggests that there were no architectural improvements aside from Tensor Cores ac the IPC increased only by 2.6% (carefully normalized for core boost clock). Theoretically process change from 16nm to custom 12nm allowed for 40% increase in efficiency that was fully utilized to increase SP count at the additional cost of increased chip size due to (probably) Tensor Cores inclusion. Oh, and 2MB more cache too.
  • CiccioB - Friday, May 12, 2017 - link

    How do you calculate IPC? By using theoretical FMA throughput? Which is fixed 2 for each clock?
    You understand that is not the right way to do the math....
    You have to measure the work done / theoretical TFLOPs to see how it has improved over other architectures. But unless you have a Volta GPU working today, you can't know how the IPC has changed,
    And no, those Vega based cards are not available yet despite being shown months ago.

    What is strange with nvidia presentation of Volta? nvidia did the same in 2015 with Pascal that came out 8 months later (exactly as it seems Volta will be, or maybe even earlier).
    You all seems too focused in trying to find nvidia alarmed of AMD moves and see "strange moves" in his strategy.
    nvidia is well aware of the HBM2 availability and how large can be AMD launch quantity independently of the performance. I do not really think that are worried a little bit by what AMD is doing. They already make enough money with current margins and will nullify AMD one on Vega by having priced 1080Ti so low in price. So they know that for at least a year and half won't have any problem and at the same time they cut AMD gains on the new architecture.
  • jimjamjamie - Thursday, May 11, 2017 - link

    So Nvidia customers are not going to be able to buy 12nm FFN GPUs, and AMD can do it easily but they actually can't. Got it.
  • eddman - Thursday, May 11, 2017 - link

    "it is the same architecture, simply not crippled. This product, in its crippled form is already available to consumers."

    So Ryan is a liar now? Did you even read the article?

    "Volta is a brand new architecture for NVIDIA in almost every sense of the word. While the internal organization is the same much of the time, it's not Pascal at 12nm with new cores (Tensor Cores). Rather it's a significantly different architecture in terms of thread execution, thread scheduling, core layout, memory controllers, ISA, and more."

    What do you even know about volta besides what is written in this article? Your personal bias is through the roof.
  • ddriver - Thursday, May 11, 2017 - link

    There are some changes, but they are minuscule. And the bulk of those changes is due to the adoption of HBM, not the minor core tweaks.

    Whether or not this is "significantly different" boils down to one's idea of "significant".

    To a poor man, 100$ is significant, to a rich man, 100$ is not even enough for walking around money.

    Lastly, when you enjoy generous sponsorship from nvidia (and NOT from amd), you are inclined to overexponate.
  • eddman - Thursday, May 11, 2017 - link

    So you ARE calling ryan a liar. Good that it's cleared up now.
  • Ranger1065 - Friday, May 12, 2017 - link

    Storm in a teacup dude, pop a chill pill.
  • CiccioB - Thursday, May 11, 2017 - link

    There are some changes, but they are minuscule. And the bulk of those changes is due to the adoption of HBM, not the minor core tweaks.

    But Pascal already uses HBM2! So what are those changes you are denying?
  • Meteor2 - Thursday, May 11, 2017 - link

    You must be on some strong chemicals.
  • Ryan Smith - Thursday, May 11, 2017 - link

    "Lastly, when you enjoy generous sponsorship from nvidia (and NOT from amd), you are inclined to overexponate."

    To be clear here, this isn't a sponsored trip. AnandTech is paying its own way on this (and our other) trade show trips.
  • Yojimbo - Thursday, May 11, 2017 - link

    There are significant changes to GV100 as compared to the GP100. They expanded the SIMT model to be able to deal with thread divergence differently and allow fine-grained locking. They enhanced the L1 cache to allow code to take advantage of the GPU with less manual tuning. They reduced instruction and cache latencies. They added dedicated INT32 cores to the SMs. The unified memory management has been enhanced and hardware acceleration for parts of the CUDA multi-process server were added.

    The GV100 has 40% more SMs than the GP100 but less than 40% more transistors. They both have the same amount of L2 cache and register files per SM, while GV100 has 128 KB of configurable L1 cache/shared memory per SM compared to GP100's smaller and inflexible 24KB of L1 and 64KB of shared memory. GV100 also contains two extra warp schedulers per SM and a new L0 instruction cache that's apparently faster than their old instruction buffer. Add in the independence of the INT32 cores. Then consider that the cores needed to be properly meshed together to allow the tensor cores to work properly (I don't know how they work, but I can't imagine the tensor cores are completely separate units considering the lack of extra transistors in the GV100. If they added entirely new execution units for the tensor cores without increasing the transistor count per SM that would require an even more impressive reworking of the SM.) All this with less transistors per SM for the GV100 GPU compared to the GP100. Obviously, the SMs were seriously reworked.

    We haven't seen how significant the architectural changes are in terms of graphics, yet, but in terms of compute, Volta significantly ups the game for allowing easier and more extensive extraction of data parallelism in workloads and for the acceleration of deep learning.
  • Yojimbo - Thursday, May 11, 2017 - link

    Oh, in that whole post I forgot about what's perhaps the biggest indication of massive changes between the GV100 and the GP100. The Tesla V100 is 50% more power efficient than the Tesla P100 even though the power efficiency gains from the underlying process technologies (16FF+ to 12FFN) cannot possibly approach anywhere near that number.
  • mat9v - Thursday, May 11, 2017 - link

    15Tflops / 10.6Tflops = 41.5% more performance from the same power. There is no need to account for clock changes as we are not looking for IPC.
    BTW, IPC generation to generation increased by 2.6% so not much, probably they put most of their work into creating Tensor Cores and optimizing transistor usage (space) for increased performance.
  • eddman - Friday, May 12, 2017 - link

    Flops =/= performance
  • CiccioB - Friday, May 12, 2017 - link

    Your transistor "count" is somewhat wrong.
    You can't say it has 40% more SM with less than 40% more transistor.
    A GPU is not made only of shaders or SM. Just a part of it (though large) is occupied by what is considered the core. There's also the uncore part which has not scaled and so didn't require extra transistors.
    For example memory controller is the same in GV100 e GP100. ROPs (which care not so small) are the same. So those 6 billions new transistors have to be divided almost "only " between the new ALUs and TMUs (don't know the impact on thread scheduler).
    Seen what's inside a GM200 with its 8 billions transistors, this 6 billions transistors means that nvidia has added a complete new GM200 (memory and bus controllers aside) to GP100. So there are enough transistor for anything.
    I too do not know if tensor flow is just a repath of old ALUs or a complete new optimized SIMD block (seen the number it can be a repath) as I do not know how difficult can be setup the path for configuring the TensorCore on request and then continue with other calculations (FP16 and INT32 that are used by the TensorCore) at next instruction.

    By the way the 6 billions transistors are quite a lot of transistor to be used for computation resources and it does not imply that the SM had to be "shrink" in transistor number to accommodate the new ALUs.
  • Yojimbo - Friday, May 12, 2017 - link

    "Your transistor "count" is somewhat wrong.
    You can't say it has 40% more SM with less than 40% more transistor."

    Yes, I can say that. It's a verifiable fact.

    "A GPU is not made only of shaders or SM. Just a part of it (though large) is occupied by what is considered the core. There's also the uncore part which has not scaled and so didn't require extra transistors."

    Yes, I know that, CiccioB, but approximation is the best we can get. It's a flawed approximation, but I don't think it's a useless one. And comparing the GP100 and GV100, two chips with very similar architectures (both use HBM2, etc), aimed at the same market, one with 60 SMs and the other with 84, should result in a reasonably good approximation. Yes, one would expect the total number of transistors per SM to fall with more SMs, but the SMs take up the bulk of the transistors on the chips, and other parts of the chip have similarly been scaled up in size, such as the ROPs and texture mapping units. There could of course be big changes in the size of the NVENC and NVDEC blocks, etc, in which case we'd be a bit out of luck. Comparing GM200 to GP100 is not a good idea, btw. GM200 does not contain a significant number of 64 bit capable FP units in its SMs. Better to compare the GM200 with the GP102.

    What I see is that both Maxwell and Pascal seem to both add a significant number of transistors per core over their immediate predecessors. The performance improvements in Maxwell over Kepler seemed to come via having more performance per core than Kepler at the same clock, while transistors per core rose. The performance improvements in Pascal over Maxwell seemed to come primarily from higher clock rates enabled, while transistors per core rose. GV100 seems to get its performance increase primarily by being more energy efficient than Pascal. Significantly more cores can be added and run at the same clock speed as the GP100 while maintaining the same total power draw. GV100 seems to maintain the clock speed boost from Pascal, add new flexibility and features to its SMs, but yet still maintain a similar transistors per SM count as Pascal, or possibly even walk it back slightly.

    A significant number of features were added to each SM in the GV100, as noted. It's a very rough calculation and far from perfect, but I think my approximation has merit.
  • CiccioB - Monday, May 15, 2017 - link

    Of course approximation is required while talking about this matter, but I just had the impression that you were trying to split the hair over the transistor numbers with respect to the number of added ALUs.
    SM are a big part of the GPU die, but they are not the whole part as I said.
    If you scale only the SMs at net of memory controller, ROPs and thread manager, you'll end up with a number of added transistor that justify the addition of the new ALUs (all of them as independent ALUs) without believing that the SM transistor count has to be shrunk to accomodate them all.
    I made the comparison with GM200 just to show how many things you can put in 6 billions transistors. Of course SM comparison can't be made, but in GM200 8 billions transistors you have 3000+ FP32 ALU, 384bit MC, 96 ROPs, 176TMU, that is a complete beefed up GPU.
    What we see is the add of 6 billions transistors only for the core part of the GPU. Which are not few.
  • Yojimbo - Monday, May 15, 2017 - link

    "If you scale only the SMs at net of memory controller, ROPs and thread manager, you'll end up with a number of added transistor that justify the addition of the new ALUs (all of them as independent ALUs) without believing that the SM transistor count has to be shrunk to accomodate them all."

    You said my claim was too strong considering the approximations that need to be made and now you've gone on to make a claim just as strong as mine. What reason do you have to believe this?
  • Gasaraki88 - Thursday, May 11, 2017 - link

    What are you talking about? This is what the next consumer card will be based of off. It always been that was and it even said so in the article.
  • sfg - Tuesday, August 15, 2017 - link

    If AMD was not an underdog they'd have the same business practices as NVIDIA or any other corporation in a position of near monopoly. AMD is not made of saints nor is NVIDIA made of devils. They both only care to make as much money as possible and use the methods available to them. Period.
  • splashgizmo - Friday, October 20, 2017 - link

    Starts out his tirade with this ain't and then goes on to state AMD is not technically inferior. Completely disregard this diatribe.
  • Samus - Thursday, May 11, 2017 - link

    Compare AMD's growth margins to nVidia's over the last decade (since AMD purchased ATI) and it's pretty clear who is winning the GPU wars. AMD has always had some great value propositions, but nVidia has always had faster, more efficient cards. Their stock price reflects that. In the tech community, only Apple has had a growth trajectory this vertical since then.

    That said, I can't believe how much silicon they are going to waste. Even with the V100 Hyperscale (which is likely going to be a salvaged defect with some shit disabled) they are unlikely to have even 50% yields on a new process unless A LOT of the Hyperscale GPUs' internals are disabled, and it has been designed to be reconfigurable around defects.

    Lastly, the 12nm "FFN" is incredibly intriguing. I understand nVidia is in the position to get its own manufacturing line from TSMC, but perhaps this process has more to do with improving yields specifically for V100.
  • CarrellK - Thursday, May 11, 2017 - link

    A bit misleading and inaccurate...

    Comparing AMD's growth margins is apples-to-oranges, as AMD was/is primarily a CPU company and nV was/is primarily a GPU company.

    nV has not "always had faster, more efficient cards". The GPU generation spanning AMD's RV7xx through Southern Islands on the whole favored AMD. In particular nV's 45nm products were very power inefficient in comparison.

    As for silicon waste, I'll be surprised if nV's yield is at/below 50%. With good physical models (something nV did NOT have at 45nm) and redundancy with reconfigurability they should be close to 70%.

    A bit later in comments some Vega opinions are expressed. Without opining too much, I'll say that AMD is doing well to recover from the BOD's and Rory Read's attempts to completely kill AMD's GPU business, as well as the significant loss GPU team and leadership talent. nV and Apple have profited by scooping up a lot of that lost talent.
  • CiccioB - Thursday, May 11, 2017 - link

    Nv (like AMD) never used 45nm PP.
    Nv went directly from 65nm to 40nm using 55nm only for die shrinking the old but at the time still completely competitive G92.
    AMD bet on 55nm and GDDR4 that didn't gave it that needed boost to surpass nvidia offer in term of performances.
    Yet, Terascale was much more efficient (being GPGPU limited). Once AMD came with what that thought being an efficient GPGPU architecture they lost all advantages and started rolling back in everything, from die size to energy consumption.

    At the prices nvidia is selling this beasts yields have not meaning. They could even get two GPUs working from an entire wafer. They would still make a lot of money nonetheless.
  • vladx - Thursday, May 11, 2017 - link

    Like mentioned elsewhere, 12nm "FFN" simply means 12nm FinFET Node, it's not some process specifically optimized for Nvidia.
  • vladx - Thursday, May 11, 2017 - link

    Again, I put this quote here:

    "Taiwan Semiconductor Manufacturing Company (TSMC) has secured 12nm chip orders from Nvidia, MediaTek, Silicon Motion Technology and HiSilicon for the fabless firms' different chip products, according to industry sources."
  • zepi - Thursday, May 11, 2017 - link

    I don't think there is anything 12nm in the process. I bet it is just slightly improved 16nm process.

    GV100: 21000Mtra / 815mm2 = 25.77MTra/mm2
    GP100: 15300Mtra / 610mm2 = 25.08MTra/mm2

    That is such a miniscule difference in density that it could easily be caused by the change of transistor configuration, not by transistor dimensions themselves.

    I'm thinking there is some truth to the Intel's complaint that others are just BS'ing their node names.
  • vladx - Thursday, May 11, 2017 - link

    'I'm thinking there is some truth to the Intel's complaint that others are just BS'ing their node names."

    Obviously, both TSMC and Samsung have to resort to marketing to create the illusion of being equal or ahead of Intel which sadly makes even some techies believe their BS.
  • CiccioB - Thursday, May 11, 2017 - link

    This 12nm process does not increment density but energy saving and most of all production cost as it uses less masks and allow for much bigger die sizes like this monstrous one.
  • peevee - Monday, May 15, 2017 - link

    That is true. The "nodes" were BS after 32nm. Even Intel's 10nm is BS too.
  • medi02 - Thursday, May 11, 2017 - link

    MI25 promised 12.5 TFlops vs 15Tlops in this thing.
    Not that big of a difference.

    PS
    Oh, and market is rather small so far.
  • theuglyman0war - Sunday, May 14, 2017 - link

    I love how Nvidia has gone large on the die to own. If moore's law is dead the performance limit is still only limited by the size of the universe.
    Since the power consumption complaints of GTX480 I have always felt that power cost has always been exaggerated for desktop use ( after I actually added up 16 hours a day 7 days a week on that tri -sli and laffed at the trivial ( relatively compared to enthusiast expense ) cost! )
  • hanssonrickard - Thursday, May 18, 2017 - link

    http://www.anandtech.com/show/11403/amd-unveils-th...
  • zepi - Wednesday, May 10, 2017 - link

    When Nvidia can make about $15k from each 800mm2 die manufacturing V100's, it really has no good reason to do 400mm2 GV104's it can at maximum sell for $600 each...

    And as long as the price discrepancy keeps being this huge, they'll rather sell 4x GV100 "workstations" for $69k and 8x GV100 DGX-1 rack-servers for $149k.
  • vladx - Wednesday, May 10, 2017 - link

    Yes but these high compute GPUs pay for keeping all the consumer parts that much cheaper
  • jjj - Wednesday, May 10, 2017 - link

    Quite the opposite nowadays really.
    Consumer parts are funding the development of these monsters.
    AMD gave Nvidia the opportunity to keep prices high above 300$ and they did that just when high FPS, high res and VR pushed the consumer's needs up.
  • vladx - Thursday, May 11, 2017 - link

    Not the point, if these high domain cards weren't in play, Nvidia would've had to ramp their prices even higher for their consumer line. Just selling a few thousand of these monsters is good enough.
  • Eliadbu - Wednesday, May 10, 2017 - link

    you are totally wrong, those cards Nvidia wont sell them as near as much of consumer grade cards, and the consumers cards eventually will produce much higher earnings then those compute cards since they sell so much more units. remember that big part of the costs of the cards is not the fabrication of the die and manufacturing of the card , but R&D, 3 billions $ were put for this card development, and if you want to counter it, you need to sell more units since each unit would take less of the costs of R&D.
  • Demiurge - Wednesday, May 10, 2017 - link

    Who says companies that shell out the money for those cards are buying just a single one?
  • fanofanand - Wednesday, May 10, 2017 - link

    You have to factor in yields. Nvidia would love if every die was flawless, but they aren't even selling a fully enabled GPU.
  • CiccioB - Thursday, May 11, 2017 - link

    At those prices yields are for "poor" companies.
    2 GPUs working each wafer and they still make money.
  • fanofanand - Thursday, May 11, 2017 - link

    Jen is that you? You seem to be shilling on this thread pretty hard. Team Green sending you greenbacks for your incessant trolling?
  • Ranger1065 - Friday, May 12, 2017 - link

    Could it really be the ghost of Chizoo?
  • Gasaraki88 - Thursday, May 11, 2017 - link

    They have to sell the V100 at high cost because the yields are so low. As yields improve, cost drops to a point where you can sell something to regular consumers.
  • Gasaraki88 - Thursday, May 11, 2017 - link

    It's all about volume. Consumer cards are cheap but they can sell more of them than 15 thousand dollar specialized cards. So it's true consumer cards probably enable R&D for future cards, they need to initially sell the low volume, low yield, cards to pay for the consumer cards.
  • allanmac - Wednesday, May 10, 2017 - link

    This math doesn't add up:

    "a single Tensor Core performs the equivalent of 64 FLOPS per clock, and with 8 such cores per SM, 1024 FLOPS per clock per SM"

    Typo?
  • flameyyy - Wednesday, May 10, 2017 - link

    yea its 1024 FLOP / clock vs 256 for a TPC not an SM
  • Ryan Smith - Wednesday, May 10, 2017 - link

    The math is correct. It's 64 FMAs per clock, with 8 Tensor Cores per SM. So 64 * 2 (FMA) * 8 = 1024.

    FP16 is 64 * 2 (FMA) * 2 (Vec2) = 256.

    But I've edited the passage to make it clearer.
  • fanofanand - Wednesday, May 10, 2017 - link

    Moments like this remind me of what a Luddite I really am.
  • allanmac - Wednesday, May 10, 2017 - link

    Also, is the Tensor Core actually performing a "matrix multiply" or a just an element-to-element multiply (tiled FMA)? Pretty sure it's the latter.
  • qap - Wednesday, May 10, 2017 - link

    Defining parameter here is 64 FLOPS per clock per tensor core. Whether it will be grouped under single instruction as matrix multiplication (which it looks like - because if you want to calculate product of two 4x4 matrices you need 64 MADD ops) or you will need to call 4x simple MADDs of matrix elements is not really important.
  • Pino - Wednesday, May 10, 2017 - link

    But, can it run Crysis?
  • galta - Wednesday, May 10, 2017 - link

    only if you game @1080p.
  • vladx - Wednesday, May 10, 2017 - link

    Yes, and at 8k resolution nonetheless.
  • lamebot - Wednesday, May 10, 2017 - link

    Most modern GPUs can run Crysis, even the lower powered ones. This question is no longer relevant.
  • Arbie - Wednesday, May 10, 2017 - link

    It's just as relevant as it was in 2007.

    When - in fact - all then-modern GPUs would run the game. Though some would deliver only low FPS at rock-bottom settings. I ran it on a year 2000 Pentium PC with a GeForce 4 256MB card.
    The idea that you needed a beyond-top end rig just to play it was a canard foisted on the public by haters and clickbait artists. This dealt Crytek a blow which may ultimately have been fatal. What a shame. IMO there has not been a game since that equaled it, and only its direct successors even compare.
  • nintendoeats - Wednesday, May 10, 2017 - link

    This very website had to use four GTX 9800s to get an average of just under 60 FPS at full setting and 1920 by 1200. They did call using tri-sli "absolutely playable". Yes, it can be run at lower settings and resolutions, but you are losing a huge part of the game (and I think the gameplay is great). I think the question "will it run Crysis" really means "will it run Crysis and actually FEEL like Crysis.

    In this case, no because I doubt these chips support DirectX.
  • frenchy_2001 - Wednesday, May 10, 2017 - link

    GV100 is a fully enabled graphic chip with killer perfs.
    Using it as such is missing most of the point and features dedicated to compute (AI and tensors particularly), but it would run (probably like a beast). Take for reference the GP100, which started its life in the previous Tesla P100 and ended in a workstation card Quadro GP100. As stated then, it had few advantages compared to GP102 in the Quadro P6000, including better FP16 perfs, more and faster memory, but worse FP32 and graphical performances.
    Expect GV100 to behave similarly, top of the hill in HPC/AI/Tensor, but beaten by GV102 in graphics (both games and pro) in the next titan. Still the best there is until such GV102 is released though...
  • CiccioB - Thursday, May 11, 2017 - link

    Differences in games performance between GP102 and GP100 are due to the fact that the last one has more clock and the fact that games are optimized for GP102 drivers, not for sure for GP100 (which has a slightly different SM configuration).
  • jjj - Wednesday, May 10, 2017 - link

    So next week a new Titan? LOL

    They did go all out today and it's the first real salvo in this deep learning race.
    They don't get stuck the GPU being a GPU ,they push CUDA hard and will be difficult for others to catch up.

    In PC, Volta looks encouraging, a pity that Vega is a year late but AMD will need to do much better the next time. Hopefully Volta plays nice with low level APIs too...
  • Meteor2 - Thursday, May 11, 2017 - link

    All V100 production is going into HPC this year -- the DoE supercomputers, and the ones we don't know about. It won't show up in consumer for about a year.
  • Eden-K121D - Wednesday, May 10, 2017 - link

    Jensen Huang is obsessed with being the best and that is a good thing.
  • CiccioB - Thursday, May 11, 2017 - link

    He had always been this. This also allow his minuscule company to roll over all other bigger companies in GPU manufacturing. He always pointed for the best and highest. Sometimes failing, but without stopping at crying for that.
    He has opened new markets where before there was none.
  • Ranger1065 - Friday, May 12, 2017 - link

    OMG it IS Chizow.
  • Flunk - Wednesday, May 10, 2017 - link

    Consumer Volta mid next year then?
  • Yojimbo - Wednesday, May 10, 2017 - link

    This architecture has so many compute-oriented optimizations that it's hard to know if their graphics GPUs have diverged from their big data center GPU even more than in the Pascal generation. If there's considerable divergence, then consumer Volta in early 2018 seems reasonable to me.

    A major principle of NVIDIA, however, is that their products are all driven by a common architecture. If the consumer GPUs are still largely very similar to this GV100 then I'd guess that consumer Volta should appear before the end of 2017, perhaps in September or October.

    The wildcard would be if NVIDIA considers 12nm FFN too costly for GeForce GPUs at the moment and either waits for the cost of the process to go down or puts GeForce GPUs on 16nm FF+. In that case I guess the architecture could be very similar while the consumer cards are still delayed until early 2018.
  • CiccioB - Thursday, May 11, 2017 - link

    12nm manufacturing process should be cheaper than 16nm one.
  • Yojimbo - Thursday, May 11, 2017 - link

    Why?
  • CiccioB - Thursday, May 11, 2017 - link

    It uses less masks
  • Yojimbo - Thursday, May 11, 2017 - link

    How do you know that?
  • CiccioB - Friday, May 12, 2017 - link

    I was wrong. It is FFC which is cheaper. This is FFN.
  • Meteor2 - Thursday, May 11, 2017 - link

    I think it will be around this time next year because of supply constraints as much as anything.
  • boozed - Wednesday, May 10, 2017 - link

    Alt-0178 for the ² character.
  • Yojimbo - Wednesday, May 10, 2017 - link

    Something I thought was interesting that you didn't choose to include in your overview is that the GV100 has 64 dedicated INT32 cores per SM, which NVIDIA says allows for "simultaneous execution of FP32 and INT32 operations at full throughput, while also increasing instruction issue throughput." I wonder what the use case for that is.

    Between the independent scheduling, lowered dependent instruction issue latency, enhanced L1 data cache, flexible L1 cache/shared memory configuration, and enhanced unified memory support, it looks like NVIDIA is trying very hard to make it easier to extract data parallelism from code and to make GPUs easier to program for.

    I wonder what sort of enhancements are in store for Volta for graphics workloads.
  • Yojimbo - Friday, May 12, 2017 - link

    Seems the independent, parallel integer and floating point data paths are there so that Volta is "more efficient on workloads with a mix of computation and addressing calculations."
  • Frenetic Pony - Wednesday, May 10, 2017 - link

    Weird that they're "unveiling!" it now, because TSMC's 12nm literally just entered production, so there can't be any availability, HBM2 is rumored to have super low yields, so no availability there either, and it'd be really surprising if Apple didn't buy up all the 12nm runs for iPhone 8 like they do every year.

    But I guess they want to pre-sell as much as possible? At the profit margins for $15k a chip I'd want that as well.
  • Yojimbo - Wednesday, May 10, 2017 - link

    NVIDIA have been using HBM2 for almost a year in the Tesla P100 and also the Quadro P100. Availability is there, one just has to pay the price for it. The price is easier to swallow for a several thousands dollar data center card than for a consumer graphics card. Apple isn't using 12nm for their upcoming iPhone SoC, they are using 10nm. 12nm is a refinement of the 16nm node, not an entirely new node.

    At NVIDIA's earning conference call that was held the day before the GV100 reveal, an analyst noted an increase in NVIDIA's inventory levels as of April 30th, 2017. Jensen Huang said that it was due to a new product they were building inventory of that people should watch his keynote the next day to find out about. That product turned out to be the GV100. So the chips are really being manufactured and accumulated. As far as availability, orders for NVIDIA's GV100-containing DGX-1V server have begun today, with deliveries beginning in Q3 of this year (July through September). Additionally, GV100 chips will be delivered this year for the building of the Summit and presumably Sierra supercomputers.
  • sonicmerlin - Wednesday, May 10, 2017 - link

    Meanwhile at AMD, vega is... Nowhere to be found. Jeez talk about a performance discrepancy. Has fhe gap between the two companies ever been this large?
  • beginner99 - Thursday, May 11, 2017 - link

    Why would NV disclose so much info about this several month before actual launch? Why? There is only one reason. To tell people to wait and not buy Vega once it is released. This also means NV is a bit "scared" of Vega being a bit too competitive.
  • renz496 - Thursday, May 11, 2017 - link

    i really doubt if nvidia really scared of AMD Vega. Vega could be very good hardware wise but what about softwares? just look at what happen between Hawaii and GK110/210 in HPC space. when nvidia ditch FP64 with maxwell design AMD have very good opportunity in offering significantly better DP accelerator market with Hawaii. power wise both hawaii and GK110/210 consume about the same. but Hawaii simply have the better performance in DP due to it's 1:2 ratio of DP:SP. 2.5tflops of DP performance vs 1.5DP performance for nvidia. Hawaii have 67% raw performance advantage vs nvidia GK110/210. and yet how the market reacts to that? majority of HPC client keep using nvidia GK110/210 solution. while the existing system with majority of them was waiting for nvidia pascal and intel KNL instead of upgrading their machine with AMD S9150.

    if anything Volta is nvidia counter to google TPU or any ASIC based Tensor processor before they really gaining the ground.
  • Yojimbo - Thursday, May 11, 2017 - link

    The reason they are disclosing so much info about this now is two-fold. 1) It's GTC. That's what GTC is for. To give information to developers. 2) Developers want to know about the workings of a new architecture as soon as possible, especially the supercomputing crowd that wants to use Summit. They have a lot of code that they have to scale and optimize properly for the new supercomputer. They can't do that properly without information on the architecture.

    If NVIDIA really were worried about Vega and wanted to rain on its parade they would have revealed GV104 as well, even if it doesn't come out for 7 months, just like AMD did with Vega.
  • Yojimbo - Thursday, May 11, 2017 - link

    I saw a rumor that AMD plans to only have 16,000 units of VEGA available for the first few months. Frankly, I wonder if it might in fact be true. HBM2 prices seem to be high, and the fact that AMD has been showing off Vega in demos for 6 months makes me wonder if its a product they don't actually want to come out with.

    AMD perhaps put most of their resources into Zen, which looks pretty good and looks like it will provide them with some success. But in doing so they might have let their graphics efforts fall off quite a bit, focusing mainly on maintaining their ability to supply the console market and leaving the high end to NVIDIA. They announce products way in advance to keep their investors happy and their stock price up, and then they delay the release of those products as long as possible. When they finally release them, I wouldn't be surprised if those products are very difficult to actually buy, because they are unable to configure them competitively and sell them profitably at the same time. Maybe it's a bit conspiracy theory-ish, but I'm starting to believe it might have some truth to it.
  • CiccioB - Thursday, May 11, 2017 - link

    Yes, it was at G80 launch against R600.
    That was really a real gap which AMD tried to close a couple of years later.
    Comparing to nowadays, the gap had never been so large for so long. And increasing.
    That's what's alarming.
  • renz496 - Thursday, May 11, 2017 - link

    isn't that apple will going to use 10nm? btw as mentioned by the article this new process is called 12nm FFN. that N is directly refer to nvidia. this is a process that was customized specifically for nvidia architecture. not even sure if this process will be suitable for apple SoC or not. nvidia said they spend 3 billion in R&D to develop Volta. i heard for pascal alone nvidia spend 2 billion for R&D. that 3 billion for volta probably including the deal they have with TSMC to make a process that can only be used by their GPU.
  • vladx - Thursday, May 11, 2017 - link

    " btw as mentioned by the article this new process is called 12nm FFN. that N is directly refer to nvidia."

    No, 12nm FFN = 12 FinFET Node
  • vladx - Thursday, May 11, 2017 - link

    And apparently TSMC has orders from other companies for this node:

    "Taiwan Semiconductor Manufacturing Company (TSMC) has secured 12nm chip orders from Nvidia, MediaTek, Silicon Motion Technology and HiSilicon for the fabless firms' different chip products, according to industry sources."
  • Ryan Smith - Thursday, May 11, 2017 - link

    12nm FFN is an NVIDIA custom node, and the N does in fact stand for NVIDIA. It is a variant of TSMC's standard 12nm node, with a tweaked ruleset for higher performance as requested by NV. This info comes straight from NVIDIA's engineers.
  • vladx - Thursday, May 11, 2017 - link

    Hmm, then what about my quote above?; does that mean TSMC made a custom node for each of those customers?
  • Yojimbo - Thursday, May 11, 2017 - link

    Maybe, but those other companies are probably using CLN12FFC.
    http://www.anandtech.com/show/11337/samsung-and-ts...

    TSMC has a low-power 12nm process that they have previously discussed publicly. But the 12nm process variant that NVIDIA is using is high-power capable and seemingly hasn't been mentioned publicly before.
  • vladx - Friday, May 12, 2017 - link

    Yeah that can explain it, thanks.
  • Meteor2 - Thursday, May 11, 2017 - link

    It's just their spoiler ahead of Vega (due in the next couple of weeks).
  • ABR - Thursday, May 11, 2017 - link

    Tensor units. It would be interesting to know the relationship between this and Google's TPU. Were they collaborating, or is this a move to fend off any plans Google might have to sell externally?
  • Qwertilot - Thursday, May 11, 2017 - link

    Likely more of a move to try and fend off other companies producing special purpose hardware for this sort of thing. Its a big market already and potentially increasing fast so that sort of competition is pretty inevitable.
  • Meteor2 - Thursday, May 11, 2017 - link

    I was wondering that too. I don't think Google would sell TPUs; I think it's simply a good idea that Nvidia have chosen to incorporate (they do seem to have included the kitchen sink).
  • Chadsterbot1975 - Thursday, May 11, 2017 - link

    It seems like a lot of comments concerning AMD are aimed at taunting them for not keeping up with Nvidia. I honestly don't think this is AMD's intention at all. Given the strategy they've employed with Ryzen it doesn't seem like they're even trying to compete one-for-one with Nvidia's hardware. Ryzen was aimed at the 'Value Gaming' market. Delivering gear that offers excellent performance for half the price or better. AMD's CPUs are frequently tops for value performance and are only bested when someone sells a bunch of old Xeon's for an absurdly low price and that shows up on Passmark's value list. As for GPUs, AMD's RX series have been taking turns with Nvidia's GTX 1060 3GB for top honors in Value Gaming for many weeks now. At the time of this writing AMD's RX 460 for $89.99 is king of the hill in value on Passmark.

    To me the funniest thing about this is how many VERY intelligent people seem to be making the assumption that Nvidia and AMD are even competing with each other. It's as if people forgot that AMD got its start working with Intel using their old dies. GPU-wise there are effectively only two companies on the planet manufacturing graphics cards. Any competition between these two companies is really only in the minds of the fanboys. Nvidia goes high, AMD goes low and they both win by controlling just about 100% of the market. It only SEEMS like Intel and Nvidia are top dogs. Penny-for-penny gaming it's almost impossible to beat a Ryzen 5 1600 CPU with an RX 460 GPU with any combination of Intel/Nvida in their price range right now.
  • CiccioB - Thursday, May 11, 2017 - link

    <blockquote>To me the funniest thing about this is how many VERY intelligent people seem to be making the assumption that Nvidia and AMD are even competing with each other.</blockquote>
    This is the most "unitelligent" justification I have read to try to hide AMD less than par architecture.
    You compare prices, not costs.
    460 make circle against what?
    It's like listening to those that where enthusiasts that 380 was (marginally) faster than the 960.
    If you can't do any serious comparison between two companies' products, please do not ridicule yourself by simply do a price comparison to say that one company is not competing voluntarly against the other.

    Do you think that the "great and marvelous extra efficient" Polaris 10 was built to go against the smaller and less energy hungry GP106 or the same resource capable GP104?
    Same shaders number, same memory width, more TMS (yes less ROPs) to stay 50% behind. Maybe it is for this you say they do not compete. Because AMD can't EVEN IF IT TRY!
    And try hard.. see Fiji.. see Vega... they try and try.. but they simply can't. Not with this GCN architecture they presented and was ridiculed 3 months later.
    They have to change it. And only then they may have some chances. For now they are just trying to float and not be swallowed by the big nvidia whale.
  • Ranger1065 - Friday, May 12, 2017 - link

    Spoken as a true Jen-Hsun acolyte. Long live the faithful.
  • CiccioB - Monday, May 15, 2017 - link

    Ahahahah.. long life to those that can't understand reality.
    Facts are facts, whatever you dream at night.
  • Meteor2 - Thursday, May 11, 2017 - link

    Well said Chad. Nvidia and AMD have both made intelligent choices to maximise their returns. It's Intel which seems to be in a little bit of trouble.
  • garbagedisposal - Thursday, May 11, 2017 - link

    You are delusional, of course AMD and nV are competing with each other. The reason why AMD can't keep up with them and don't intend to is because they're not capable of doing so.

    GCN is nearly two generations behind nvidia in performance/watt, substantially behind in perf/area, and each GPU nets them less profit both on an absolute and relative basis over their costs - and it's been like this since GCN arrived. It's an inferior architecture with drivers that lack performance out of the gate. RX is a 'value gaming' market GPU because it's not competitive. Informed people only love RX so much because AMD realized they had no option other than to sell them for cheap and claim they're doing it on purpose. At least that way they can win over some mindshare (thankfully it worked).

    Want a high end radeon GPU? Impossible because it would blow out the power and die budgets compared to any alternative from nvidia - you couldn't sell it competitively and make a profit. (they aren't making much money from the midrange RX series as it is)

    There's no CUDA. They're totally locked out of the laptop market, server market, datacenter, AI, automotive, wherever the $$$ is behind made - radeon isn't there because it's way behind. Nvidia GPUs are standard and becoming increasingly more so, despite the investments AMD have been making in software. Too little, too late. Whatever AMD manages to put out with their measly R&D budget is simply not a threat. They don't have a sustainable GPU business which is why they have to keep rebranding and make only the cheapest tweaks to GCN every year (they don't even manage yearly updates anymore, compared to nvidia's surprisingly quick turnarounds to their entire lineup - it's like clockwork).

    Vega is late and competing with an architecture that will probably beat it handily in terms of pure technology. AMD are probably going to have to price it low to compete which will be great for PC builders but not sustainable for business. The truth is that the 300 series is just an overclocked 200 series, tonga & fiji were barely any improvement over 200/300 series on IPC & density, RX was just a die shrink because they couldn't afford anything else, and vega will be a die shrinked fiji with some tweaks that will barely help gaming performance. Rumors put vega near the 1080, and if so that will be a major disappointment.

    When was the last time you saw a top-to-bottom refresh on the red team that wasn't mostly rebrands? Many years ago.

    The dedicated triangle binner in the 480? They had to do it because of nv's shenanigans - tell me, when you have to waste die space in your products due to decisions your competitors have made, what does it say about the state of the market (read: your weak role in it) and the influence of your competitors?

    Even when the 480 and 1060 are pretty much identical in performance (except for the fact that the 480 is less efficient and costs more b/c 1. it has a larger die area and 2. it needs more ram per card), it doesn't instill much confidence when I tell you that the 1060 outsells the 480 at least 2 to 1, does it?

    AMD's RX might look great to us, but the graphics side of their company is not sustainable. I actually wonder if they'll still make GPUs in 10 years if vega is the stinker I think it's going to be.

    This is coming to you from a hardcore AMD fanboy. I hate buying intel or nvidia because of their shitty ethics, but we have to be objective. If I had to make a million dollar bet I wouldn't put it on AMD.
  • StrangerGuy - Monday, May 15, 2017 - link

    AMD can only blame their own past hubris, remember who was the one that gave JHH the middle finger when he offered to lead a combined NV-AMD?

    I'm not even going to the hypocritical fanboys bragging about their $250 290X cards while whining about how NV makes too much money while AMD bleeds red ink all the time.
  • hlovatt - Thursday, May 11, 2017 - link

    Seems like TSMC is exaggerating the process improvements. 16 nm to 12 nm implies 33% more transistors in a given area. In fact, they are only getting 3% more transistors. Should have called the new process 15.5 nm!
  • vladx - Thursday, May 11, 2017 - link

    It's just marketing, just like TSMC's 16nm and Samsungs's 14nm aren't actually either 14 or 16 nm in reality.
  • CiccioB - Thursday, May 11, 2017 - link

    It may also in imply 40% less energy consumption.. that it seems what they are delivering with this new PP.
    Density is not the only parameters to be considered for a PP.
  • CiccioB - Thursday, May 11, 2017 - link

    Oh, and I forgot. It is also 40% cheaper than 16nm.
  • beck2050 - Thursday, May 11, 2017 - link

    Very impressive. The consumer variants will be insane.
  • HollyDOL - Thursday, May 11, 2017 - link

    I think Volta's double precision peak should be 7.5 TFlop (in the table)
  • TristanSDX - Thursday, May 11, 2017 - link

    Such chip do not make sense for AI. If these additional 6bn transistors account for 40% more shaders and tensor cores, that means that these tensor cores require only 1-2bn transistors. Using Volta for AI will be ridiculous from financial perspective. That's why some companies kicked NV with their big, costly power hungry chips and are designing small cheap and fast AI solutions
  • vladx - Thursday, May 11, 2017 - link

    You do know AI domain is segmented into different categories and levels, right?
  • CiccioB - Thursday, May 11, 2017 - link

    6 billions transistors for 40% more FP32 ALU, FP64 ALU, INT32 ALU, FP16 ALU, 1312 tensors and a bunch more of cache.
    I would say they are not a big deal seen what they can do with the power they use
  • MadManMark - Sunday, May 14, 2017 - link

    Do you understand the difference between training and inference? Between processing power and latency?
  • Sweetbabyjays - Thursday, May 11, 2017 - link

    After Volta i think their next architecture will be called Skynet...
  • CiccioB - Thursday, May 11, 2017 - link

    Yet, there's not indication of the names of the next architectures.
    Will we have a real revolution that is going also to change the naming conventions?
  • Sweetbabyjays - Thursday, May 11, 2017 - link

    @CiccoB if by revolution you mean uprising...then yes...
  • Ranger1065 - Friday, May 12, 2017 - link

    Nvidia shill.
  • Meteor2 - Thursday, May 11, 2017 - link

    That made me smile.

    Personally, call me a crackpot, but I think we are going to get to that point. Britain and France are collaborating on building a stealthy autonomous hunter/killer UAV. We've got bipedal robots which can run. Cars which drive themselves. The pieces are there...
  • Sweetbabyjays - Thursday, May 11, 2017 - link

    @Meteor2 With AI technology, integration leads to reliance and reliance leads to submission.
  • vladx - Friday, May 12, 2017 - link

    There won't be any real danger as long as they build not only a remote kill switch, but a local one as well.
  • pppp6071 - Thursday, May 11, 2017 - link

    I don't know where TSMC gets the names for their processes, but their "12nm" doesn't seem to be any denser than their previous "16nm".

    GV100: 21000Mtra / 815mm2 = 25.77MTra/mm2
    GP100: 15300Mtra / 610mm2 = 25.08MTra/mm2
  • TallestJon96 - Thursday, May 11, 2017 - link

    I came here to say the exact same thing; the density is barely better. I think the big focus here is power efficiency. Its hard to say if its the new node or the new architecture, but this is quite a bit more efficient. Near 50% more performance in the same power envelope.

    From what I've hear, it really is just an optomized 16nm node, so a little more density and a little lower power consumption
  • yhselp - Thursday, May 11, 2017 - link

    It seems like the increase in functional units is roughly equivalent to the increase in die size.
  • SanX - Thursday, May 11, 2017 - link

    $3 per core...Way too overpriced...

    But since performance often is memory and interconnect speed bound, it would be interesting to see comparison on PIC codes with the Intel cores and if this gear finally started to catch Intel, may be $3 is way too cheap
  • knirfie - Thursday, May 11, 2017 - link

    "this is NVIDIA’s flagship GPU for compute, designed TO DRIVE THE NEXT GENERATION OF TESTLA PRODUCTS."

    Pun intended I asssume?
  • vladx - Friday, May 12, 2017 - link

    Well a downscaled version might run in future Tesla cars as well.
  • SanX - Friday, May 12, 2017 - link

    Behind the hype for unwashed we see $3 B thrown for just 20% of improvement
  • vladx - Friday, May 12, 2017 - link

    20% in gaming maybe, but 5-10x in high compute and machine learning.
  • Yojimbo - Friday, May 12, 2017 - link

    I am guessing it should be a 40% improvement in performance per watt in gaming based on the peak theoretical throughput alone. If there are other architectural enhancements geared towards gaming it could possibly be more. It looks like a 40% improvement will require a larger die, though, but manufacturing costs should be cheaper now than they were when 16FF+ was new. NVIDIA will probably give us larger dies just as they did with the 900 series over the 700 series and the 700 series over the 600 series. Dies getting bigger as a process matures is a natural progression.
  • versesuvius - Friday, May 12, 2017 - link

    The biggest microchip in the world! There was a time that only the USSR could make those chips!!!
  • cocochanel - Saturday, May 13, 2017 - link

    Why create such a monster chip ? Where is this going ? Why not use multiple GPU's in parallel but instead of the dreadful SLI, create some motherboard with multiple GPU sockets, just like server motherboards for CPU's ? Or is the idea crazy ? Can anyone explain ?
    Thanks.
  • MadManMark - Sunday, May 14, 2017 - link

    CocoChanel, you do understand that this is an HPC & not gaming GPU, right? Your comment makes no sense.
  • versesuvius - Monday, May 15, 2017 - link

    HPC GPU? That makes even less sense. Mainly because it is the same GPU with more cores so to speak. There is not anything that two 980 Ti GPUs working together cannot do almost twice as fast as this one and more efficiently too and with a lot less money. Of course if you want to put a thousand or more of these together then Nvidia has a point, but even then it sets you back millions of dollars that could be used a lot better in any given job. This chip is propaganda and nothing else.
  • peevee - Monday, May 15, 2017 - link

    "There is not anything that two 980 Ti GPUs working together cannot do almost twice as fast as this one and more efficiently too"

    They are not going to be twice as fast in normal loads.
    They will be A LOT slower in fp16 matrix loads.
    And of course they will not be more efficient.

    But otherwise I agree.
  • versesuvius - Monday, May 15, 2017 - link

    That should have been two 1080 Ti GPUs.

    And I agree with you too. :)
  • CiccioB - Monday, May 15, 2017 - link

    It's propaganda if you can't sell it, otherwise it means it satisfies someone needs for its price.

    Apart the fact that 2x980Ti can't do one fourth of the computation work of this card, do you know that GM200, mounted on 980Ti, can't do DP calculations and INT8 ones and FP16 are just like FP32 instead of being twice?
    Having said so, you have not understood how powerful is this beast and why nvidia is able to sell it at the offered price.
    Keep on playing with consumer grade GPUs.
  • versesuvius - Monday, May 15, 2017 - link

    Two 1080 Ti GPUs.

    "Keep on playing with consumer grade GPUs".

    Will too. That is where all the fun is.
  • radioguy728 - Thursday, May 18, 2017 - link

    Is it worth it ?
  • beck2050 - Saturday, May 27, 2017 - link

    Amazing how Nvidia keeps leaping ahead.

Log in

Don't have an account? Sign up now