I hope I’m wrong about that but if Intel has plans for differentiating itself from the woeful competition (woeful for PC gamers, not for investors in Sony, MS, AMD, et cetera) by making GPUs for PC gaming rather than mining, I didn’t read anything about it here.
(I don’t like the ‘AAA’ PC games but like the console scam and other monopolist pricing shenanigans less.)
If it were truly designed for mining, they'd have left off the RT cores and XMX units. Those don't help with mining, at all.
By your definition, it seems like every GPU is designed for mining. The problem is that when mining algorithms are designed to run well on GPUs, you basically have to break the very definition of a GPU to end up with something that doesn't mine well, and I don't see how you do that without also breaking gaming performance.
‘you basically have to break the very definition of a GPU to end up with something that doesn't mine well, and I don't see how you do that without also breaking gaming performance.’
> So you’re claiming that Nvidia is utterly lying.
They've had *all kinds* of problems with their hash-rate limiting. It's not so easy, because it's an artificial limitation that has to be engineered and ends up being something of an arms race with the crypto miners finding ways to circumvent it.
The funny thing is that you missed how it *is* designed to accelerate neural network inferencing! According to the article, the XMX units take up even more space than Nvidia's Tensor cores. So, that's some real cost to consumers for a feature that probably targeted mostly at cloud-oriented use cases.
And yes, Intel puts their dGPUs on boards without display connectors, for use in the cloud:
And, of course, the issue of how the products are sold can mostly solve the mining problem.
That doesn’t mean more abuse of the consumer (i.e. bundles — with or without PSUs that catch fire).
It means establishing a supply chain that ends with gaming buyers rather than miners. Not particularly difficult to do. The key is that the companies need to want to do it, rather than trying to play every side whilst happily catering to miners.
No, they're really not. Also, it depends quite a bit on what country you're talking about. In the US, there's too much corporate influence, for sure. But it's far from total. It's not even hard to see that, if you actually bother to look.
Some of those claiming otherwise would include the intellectually lazy, or social media disinformation trolls, and those seeking to blame society for their personal failures and misfortunes.
> the issue of how the products are sold can mostly solve the mining problem.
And how is it going to do that?
It's not AMD who's selling most of these graphics cards, it's their board partners. And screwing over their board partners, so that AMD can sell direct-to-consumers is ultimately going to be self-defeating for them, and that's not going to help consumers.
'It's not AMD who's selling most of these graphics cards, it's their board partners. And screwing over their board partners, so that AMD can sell direct-to-consumers is ultimately going to be self-defeating for them, and that's not going to help consumers.'
Like MSI, the company that raised its prices to match Ebay scalper pricing?
I think you should rethink your priorities. Catering to the likes of MSI and exploding-PSU Gigabyte may be important to you. But, it is no way is a rebuttal to my point.
AMD could greatly reduce the mining problem but it doesn't want to.
First, it doesn't count as an "ad hom" if it's true. Second, if you consider hating AMD to be a character flaw, then it seems like you've got some self-reflection to do. Third, don't think we don't see you changing the subject, to distract from the fact that AMD is actually screwing gamers less than its competitors.
Simple :D Some people wanted AMD to allocate more 7nm/6nm wafers to GPUs. Now Intel is allocating more of TSMC's production lines for GPUs, regardless of what AMD wants.
Well we know these wafers won't be wasted on console socs rigtht? That's a win right there. Why make socs for peanuts when you can make 3090's, server, HEDT, etc for thousands. You don't get 60% margins from consoles, and worse trying to do it in a wafer shortage...LOL.
‘Consoles’ are a Trojan horse to artificially inflate the pricing of ‘PC gaming’ GPUs.
More parasites = more profit. That is what those redundant walled software schemes are. Pure parasitic redundancy. Paying the MS tax for Windows is bad. Paying for duplications of that over and over is worse.
(The console used to exist. It hasn’t since Jaguar chips were foisted onto the unwitting.)
> ‘Consoles’ are a Trojan horse to artificially inflate the pricing of ‘PC gaming’ GPUs.
If this were true, then PC gaming prices should've been inflated since the existence of modern consoles, not since just a couple years ago.
But, of course it's nonsense. You're just pouting like a child that you can't afford the gaming GPU you want, and casting about for a scapegoat to receive your seething rage.
All I can say about that is that it's not solving your problem. No matter how much you rant about consoles and AMD, you're still not going to have the gaming GPU you want any sooner. Maybe you should actually take some of TheJian's money-saving tips from a few threads ago. Either that, or find a better way to pass the time.
That's rich coming from the poster who foists an ad hom on us in almost every response.
'No matter how much you rant about consoles and AMD, you're still not going to have the gaming GPU you want any sooner. ... Either that, or find a better way to pass the time.'
Another ad hom that completely misses the mark. I have stated, quite a few times, my reasons for making the effort to talk about the self-defeating mythology that surrounds PC gaming and 'consoles'.
You continue to try to turn everything into a referendum on my character. Ad hominem. Your debating skills are childish.
> I have stated, quite a few times, my reasons for making the effort to talk > about the self-defeating mythology that surrounds PC gaming
Why not indulge us, one more time?
> You continue to try to turn everything into a referendum on my character.
You make it almost impossible not to, with your continual whining, off-topic thread-jacks, leftist propaganda, and anti-AMD conspiracy theories.
> Your debating skills are childish.
If yours were any good, maybe you'd win a few people over to your side. Calling everything a fallacy, as if such allegations are self-evident, self-supporting, and themselves infallible pretty soon just comes off more as laziness than even elitist tone to which you seem to aspire.
Arc on N6 is interesting for a couple of reasons. First, what the article notes, the extra density advantage that Intel will have. But on top of that, presumably, it means that the capacity dedicated to currently manufacturing AMD and Nvidia N7 GPUs won't be just competing with an additional manufacturer, because Intel is on a separate fab line, presumably we will actually get an increase in the total # of GPUs being put out per month, rather than just seeing the same capacity spread across three companies. If that's true, I wonder if the amount of GPUs AMD or Nvidia can produce on N6 relative to N7 might be a factor in them staying on N7 longer.
Granted, of course, I'm sure all of this has been worked out by all four companies well in advance, but it's exciting times. Here's to a third successful manufacturer, and more competition! Now all we need is for Nvidia to somehow jump into the CPU game as a viable Desktop CPU maker, and we'll have a double triple market (not that this one is actually likely!)
"presumably, it means that the capacity dedicated to currently manufacturing AMD and Nvidia N7 GPUs won't be just competing with an additional manufacturer, because Intel is on a separate fab line"
N6 makes significant use of TSMC's N7 infrastructure. So they should be considered shared lines for most matters.
N6 uses the same design rules not the same litho machines. You can't just inject and remove EUV machines from the same production process. N6 would have its own set of EUV and DUV scanners (initially) likely upgraded from the discontinued N7+ line.
Many people (stupidly) love to pick sides. With intel in the game/war, it'll become one more side to pick. Who cares, if we get more GPUs, and with intel needing to establish itself as a serious player, if we get better price/performance ratios, this is nothing but welcome news in our very stagnant GPU industry.
What gets people really animated is when they sense unfair dealing. Like if one GPU maker develops proprietary libraries tuned for their hardware, and incentives some game developers to use them.
Another big thing source of conflict is when companies misrepresent their products, such as via benchmark-rigging.
> very stagnant GPU industry.
It's certainly not stagnant in terms of feature set or performance!
In a previous thread, I already enumerated the myriad ways in which Vega improved over Fury X. However, all of that is beside the point. AMD has moved beyond CGN... why haven't you?
Show me where there *isn't* brand loyalty! I guess lack of brand-loyalty would be the definition of a true commodity.
Pretty much every enthusiast-oriented PC product is subject to brand loyalty. Everything from cases to peripherals, cooling products, PSUs, motherboards, and obviously CPUs.
People are tribal, by nature. They're also often insecure about their purchases and try to validate their decisions. Brand-loyalty is certainly something the brands cater to, but it would exist even if they did nothing to foster it.
"Intel the second GPU manufacturer to start including a systolic array for dense matrix operations"
Apple is a GPU "manufacturer" and includes a systolic array for dense matrix operations... The obvious one we know of is on the CPUs (though who knows, maybe there are also matrix units on the GPU or NPU?) but placing it on the CPU may well be a better location than on the GPU...
Now that I've had a chance to see the whole thing, I'm still not sure what Intel's game plan is. So we have XMX on Xe-HPG. And AMX is apparently still a real thing.
So is the idea that (with the same mindset that made such a success of AVX-512) Intel plan to have TWO separate matrix ISA's. One for (a small subset of) the plebs, attached to the discrete GPU via Xe-HPG. And a second subset attached to (some subset of) Xeon's. But nothing that is mass-market, attached to every Intel device going forward?
> So is the idea that ... Intel plan to have TWO separate matrix ISA's.
Yes, but their GPUs and CPUs have different ISAs anyhow. More to the point, you can't afford a round-trip for running XeSS on CPUs, even if consumer CPUs already had AMX (which they don't, nor will Alder Lake change that).
Also, the XMX seems to occupy more of Xe-HPG than AMX does of Sapphire Rapids. So, it should be more economical to scale deep learning performance by slotting in a bunch of Xe-HPG cards into a server, than by building a server with more Sapphire Rapids CPUs. This is the gameplan Nvidia has successfully implemented since all the way back in the Pascal era.
Also, Xe-HPG has hardware video decoders, which you need to scale up in proportion with the deep learning horsepower, if one intends to do video analysis. Sapphire Rapids has no hardware video decoder (as far as we know).
> But nothing that is mass-market, attached to every Intel device going forward?
Wait until next generation. My guess is these XMX engines will become a standard feature of all their iGPUs and dGPUs, going forward.
> Oh well, we can fix it in post (ie via OneAPI)...
It's better than nothing. It serves the purpose of enabling a large amount of software to be written which can benefit from hardware they haven't even yet released.
It (ie presence of XMX or AMX) does that when it's part of a considered plan to build out an ecosystem. Not when it's a mad scramble to appear relevant, governed primarily by financial considerations (ie higher prices to get AMX or XMX).
Or to put it differently, Intel is going to have to work hard to get that "large amount of software to be written" given the way they have handled AVX512. Yet every indication is that the exact same mindset as governed AVX512 is driving these extensions.
I don't want to be a grumpy gus; apart from anything else it's boring. But what I see here is a company that has forgotten that it sells chips because of an ECOSYSTEM, and that it therefore has to produce products based on the construction of an ecosystem. And if that means sometimes delaying a feature to get it right, or putting it on every product rather than charging for it, or ensuring uniformity across implementations, well, that's what being a responsible ecosystem steward means.
It's OK to make mistakes (TSX? AVX512?) It's not OK to learn nothing and change nothing after those mistakes.
Apple is not a discrete GPU manufacturer, which is where I was going with that. But as you are technically correct, the chair concedes the point, and I've updated the article text accordingly.
The theoretical 8 slice tflops ends up being right around the 3060ti. So if that's the biggest GPU they have designed for HPG they are going to be hitting in the low to mid range part of the market and not competing with the 3080/3070 (or their successors) end of the market.
RTX 3060Ti is theoretical max of 16.2 and RTX 3070 is 20.31 so this guess more splits the difference between the two. I hope it's closer to the 3070, but at the end of the day real world performance, price and availability will mean everything.
Assuming you're correct, we could look to Nvidia for an explanation. They provided RT-enabled workstation/server cards for the film and video production industry to put into render farms.
Another use case is for cloud-based gaming, like Google Stadia and GeForce Now.
Still, I can't answer why they couldn't just use a bunch of server-oriented (e.g. passively-cooled) HPG cards for that.
So much matrix math is a bit dumb, what are you going to use it for Intel, the only use right now or even in the foreseeable future is stuff like your upscaling. Which you added just so you can say you have AI, other upscalers match that quality and speed without AI.
Ohwell, it's competition and more supply. So bring it on!
Nvidia originally advertised it as being useful for global illumination, where a deep learning model can interpolate lighting intersections better than conventional methods.
Of course, the other reason to have it there is that Intel probably intends to serve multiple markets with these cards, just like Nvidia does with their gaming cards. Turing and now Ampere gaming GPUs are mounted on passively-cooled cards with no video connectors and sold (with a big markup) as inferencing accelerators.
Lastly, if they can use a gaming GPU for deep learning classes, it gives university students an excuse to hit up their parents to fund an upgrade. This is an area where AMD's RDNA cards are losing out.
Can you do interesting completely novel things with this? For example fluid mechanics? I could imagine that as an (unmentioned) driver of these matrices; the idea being that being able to simulate fluid mechanics well opens up a whole new regime in gaming.
> Can you do interesting completely novel things with this? For example fluid mechanics?
No, probably not at such low-precision. Not for anything serious, anyway. AMX has the same issues. It's too closely-tailored to deep learning use cases.
> being able to simulate fluid mechanics well opens up a whole new regime in gaming.
I've seen game engine demos for years, showing a suitable approximation of CFD for entertainment purposes. It seems to be the case that dedicated matrix engines aren't even needed for that.
I take your point that matrices are a powerful mathematical tool, but the precision they implemented is pretty hard limitation to work around.
Is the high precision absolutely absent? Or just not boasted about in the marketing slides?
For comparison Apple's AMX supports the full range of multiplies you'd expect, from 64b doubles through 32b floats to 16b half floats and 16b ints.
(Dougall did not find 8b int multiplies but that may be a limit in his reverse engineering techniques, not in the instruction set. Or Apple could add them if they become relevant.)
My point is it's interesting that Apple seem to have designed AMX very much as a generic matrix multiply accelerator (up to double precision) hence capable of accelerating PDE solutions, like I suggested. It would be sad (though, unfortunately very much in character) if Intel instead crippled both their offerings by obsessing over serving one market, namely neural nets, at the expense of future possibilities.
Since they only talk about fp16 and int8, I'm assuming so. Furthermore, I'll bet that "fp16" actually means BFloat16, since it takes even less silicon to implement than IEEE 754-2008 half-precision and is better-suited to deep learning.
The Tensor cores in Nvidia's Turing GPUs supported only fp16, int8, and int4. That will have been what Intel saw, while designing these first-gen XMX units.
> Apple's AMX supports the full range of multiplies you'd expect
That's cool. Is it built into their CPU or GPU, though?
The Tensor cores in Ampere GP102+ GPUs also now support BFloat16 and fp32.
> It would be sad (though, unfortunately very much in character) if Intel > instead crippled both their offerings by obsessing over serving one market
Well, it's a gaming GPU and die space is expensive. I don't really see how it's sad they left off fp32 and fp64, if almost none of its target users need it. Their compute-oriented dies will probably be more general/capable.
I think you should save that moral indignation for the limitations of the AMX capabilities in the forthcoming Sapphire Rapids CPUs.
Apple AMX is *nominally" built into the CPUs. What this means is that the AMX instructions are placed in the CPU instruction stream and look like any other AArch64 instruction. They are decoded like normal, and placed in the Load/Store unit queue. When they reference the rest of ARM64 (like using an integer register to form an address, to load a matrix register) everything behaves as you would expect.
At the point of execution, however, it's unclear what EXACTLY happens, in part because Apple is so integrated. One possibility is that AMX is literally part of the CPU just like eg the NEON units and registers. That's what it looks like from the outside. A second possibility is that in fact the AMX units (execution and registers) sit up in the L2. Conceivably there could be all sorts of variants of this, like 4 AMX units and 4 sets of registers. Or one AMX unit and 4 sets of registers. Or one unit, one set of registers, and if two cores want to use AMX, the registers are transparently spilled two and from L2.
I have not seen any data to allow me to choose between all these options, but all of them are justifiable. At some point perhaps I'll get to writing AMX test code, but it's much lower priority than "mainstream" CPU probing.
Worth noting however is that AMX is apparently also present on the E cores (or at least present in the E core cluster).
Presumably, to the extent a generic matrix multiple engine is useful for GPU work, one will be in the GPU as well. But Apple have the luxury that they can place HW where it makes the most sense, not based on what they sell. nV can't do that (has to be in the GPU) AMD can't do that (hard for them to define a full set of new x86 matrix instructions and get them accepted) Intel can't do that (wait, why not? Well, yes indeed, why not?...)
I think what we can say is that - Apple sees value in a low precision multiply unit for the NPU - Apple sees value in a high precision multiply unit for the CPU - Apple may (who knows?) see value in an intermediate precision multiply unit for the GPU
- Intel sees value in a low precision multiply unit that they can sell as being relevant to NNs, but since they don't have a close to integrated story about how their NPU fits in with the CPU or GPU, they're putting that low precision multiply unit on the GPU and the NPU.
- If Intel sees value in a high precision multiply unit on the CPU, well, for the company that just effectively cancelled AVX512, they have a strange way of showing it.
...........................
The issue of "almost none of its target users need it" is again not purely technical so I'm not going to fight about it. But the parts that are tech adjacent are - construction of an *ecosystem* depends on close to universal presence of a feature. And if Intel is in the business of selling features, not constructing ecosystems, well they've forgotten everything that gave them 50 yrs of business success.
- customers don't know what they want until you give it to them. And honestly die space is not expensive. What is expensive is continuing to operate with a 90's mentality about the cost of transistors even when transistors are so ridiculously cheap. Rebelling against this mentality ("we can't add everything that sounds good bcs silicon is expensive") is precisely why Apple is able to produce so much more desirable chips than ARM at perfectly reasonable costs and die sizes.
Okay, that's what I expected. It makes a lot more sense to have a full complement of precision-support in the CPU than the GPU. Even there, Intel is a lot more interested in pursuing specialized applications than doing what most generally useful.
> And if Intel is in the business of selling features, not constructing ecosystems, > well they've forgotten everything that gave them 50 yrs of business success.
This is a good point. I wish you could tell it to the clowns at AMD who don't care about providing good GPU-compute support on their consumer graphics cards.
> honestly die space is not expensive.
It is at GPU-scale! That's why gaming GPUs provide only 8:1 or 16:1 fp32:fp64 support.
> Rebelling against this mentality is precisely why Apple is able to produce so much > more desirable chips than ARM at perfectly reasonable costs and die sizes.
I don't agree with that. Apple's products sell at premium prices, and this not only funds their silicon manufacturing but also the considerable degree of engineering that goes into it.
And answer me this: how much fp64 do Apple's GPUs provide?
Ummmm, apparently in Pontevecchio the compute tiles are on TSMC 5nm, other tiles ar on vanilla 7nm from both TSMC and Intel. This time Intel have done things well in both consumer (TSMC 6nm) and server (compute on TSMC 5nm).
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
72 Comments
Back to Article
Kurosaki - Thursday, August 19, 2021 - link
Welcome to break the cartel!Hifihedgehog - Thursday, August 19, 2021 - link
Your daily reminder that this GPU also won't be reviewed on AnandTech :'(TelstarTOS - Friday, August 20, 2021 - link
which sucks badly. AT was the first place I went for GPU reviews and benchmarks.Like 2trip - Monday, August 23, 2021 - link
I'm new here so what is this in reference to?mode_13h - Tuesday, August 24, 2021 - link
That they used to do excellent GPU reviews, but sadly haven't in years.Oxford Guy - Friday, August 20, 2021 - link
Yay! Another GPU designed for mining.I hope I’m wrong about that but if Intel has plans for differentiating itself from the woeful competition (woeful for PC gamers, not for investors in Sony, MS, AMD, et cetera) by making GPUs for PC gaming rather than mining, I didn’t read anything about it here.
(I don’t like the ‘AAA’ PC games but like the console scam and other monopolist pricing shenanigans less.)
mode_13h - Saturday, August 21, 2021 - link
If it were truly designed for mining, they'd have left off the RT cores and XMX units. Those don't help with mining, at all.By your definition, it seems like every GPU is designed for mining. The problem is that when mining algorithms are designed to run well on GPUs, you basically have to break the very definition of a GPU to end up with something that doesn't mine well, and I don't see how you do that without also breaking gaming performance.
Oxford Guy - Monday, August 23, 2021 - link
‘you basically have to break the very definition of a GPU to end up with something that doesn't mine well, and I don't see how you do that without also breaking gaming performance.’So you’re claiming that Nvidia is utterly lying.
mode_13h - Tuesday, August 24, 2021 - link
> So you’re claiming that Nvidia is utterly lying.They've had *all kinds* of problems with their hash-rate limiting. It's not so easy, because it's an artificial limitation that has to be engineered and ends up being something of an arms race with the crypto miners finding ways to circumvent it.
taltamir - Tuesday, August 24, 2021 - link
When has nvidia not been lying?mode_13h - Saturday, August 21, 2021 - link
The funny thing is that you missed how it *is* designed to accelerate neural network inferencing! According to the article, the XMX units take up even more space than Nvidia's Tensor cores. So, that's some real cost to consumers for a feature that probably targeted mostly at cloud-oriented use cases.And yes, Intel puts their dGPUs on boards without display connectors, for use in the cloud:
https://www.h3c.com/en/Products_Technology/Enterpr...
mode_13h - Saturday, August 21, 2021 - link
To date, your most-hated GPU maker (AMD) has not done this. RDNA GPUs still have no Matrix cores, yet their CDNA chips do.Oxford Guy - Monday, August 23, 2021 - link
‘your most-hated GPU maker’Another day, another ad hom from Mode’s ouija board, tarot cards, or what have you.
Oxford Guy - Monday, August 23, 2021 - link
And, of course, the issue of how the products are sold can mostly solve the mining problem.That doesn’t mean more abuse of the consumer (i.e. bundles — with or without PSUs that catch fire).
It means establishing a supply chain that ends with gaming buyers rather than miners. Not particularly difficult to do. The key is that the companies need to want to do it, rather than trying to play every side whilst happily catering to miners.
Oxford Guy - Monday, August 23, 2021 - link
Since corporations are amoral by design, the only thing that can force them to do what I said is for consumers to stop paying to be cheated.mode_13h - Tuesday, August 24, 2021 - link
> the only thing that can force them to do what I saidNo, you're completely ignoring the role of government.
Oxford Guy - Thursday, September 2, 2021 - link
'No, you're completely ignoring the role of government.'Corporations and government are synonymous.
mode_13h - Tuesday, October 5, 2021 - link
> Corporations and government are synonymous.No, they're really not. Also, it depends quite a bit on what country you're talking about. In the US, there's too much corporate influence, for sure. But it's far from total. It's not even hard to see that, if you actually bother to look.
Some of those claiming otherwise would include the intellectually lazy, or social media disinformation trolls, and those seeking to blame society for their personal failures and misfortunes.
mode_13h - Tuesday, August 24, 2021 - link
> the issue of how the products are sold can mostly solve the mining problem.And how is it going to do that?
It's not AMD who's selling most of these graphics cards, it's their board partners. And screwing over their board partners, so that AMD can sell direct-to-consumers is ultimately going to be self-defeating for them, and that's not going to help consumers.
Oxford Guy - Thursday, September 2, 2021 - link
'It's not AMD who's selling most of these graphics cards, it's their board partners. And screwing over their board partners, so that AMD can sell direct-to-consumers is ultimately going to be self-defeating for them, and that's not going to help consumers.'Like MSI, the company that raised its prices to match Ebay scalper pricing?
I think you should rethink your priorities. Catering to the likes of MSI and exploding-PSU Gigabyte may be important to you. But, it is no way is a rebuttal to my point.
AMD could greatly reduce the mining problem but it doesn't want to.
mode_13h - Tuesday, August 24, 2021 - link
> another ad homFirst, it doesn't count as an "ad hom" if it's true. Second, if you consider hating AMD to be a character flaw, then it seems like you've got some self-reflection to do. Third, don't think we don't see you changing the subject, to distract from the fact that AMD is actually screwing gamers less than its competitors.
Oxford Guy - Thursday, September 2, 2021 - link
'First, it doesn't count as an "ad hom" if it's true.'You have no understanding of what a fallacy is.
mode_13h - Tuesday, October 5, 2021 - link
> You have no understanding of what a fallacy is.Your definition seems to be any post you don't like, but can't or won't answer with a compelling and coherent counterargument.
Einy0 - Thursday, August 19, 2021 - link
Fingers crossed, this segment is in severe need of supply and competition.dotjaz - Thursday, August 19, 2021 - link
Competition aside, how is this increasing supply?jeremyshaw - Thursday, August 19, 2021 - link
Simple :DSome people wanted AMD to allocate more 7nm/6nm wafers to GPUs. Now Intel is allocating more of TSMC's production lines for GPUs, regardless of what AMD wants.
TheJian - Saturday, August 21, 2021 - link
Well we know these wafers won't be wasted on console socs rigtht? That's a win right there. Why make socs for peanuts when you can make 3090's, server, HEDT, etc for thousands. You don't get 60% margins from consoles, and worse trying to do it in a wafer shortage...LOL.Oxford Guy - Monday, August 23, 2021 - link
‘Consoles’ are a Trojan horse to artificially inflate the pricing of ‘PC gaming’ GPUs.More parasites = more profit. That is what those redundant walled software schemes are. Pure parasitic redundancy. Paying the MS tax for Windows is bad. Paying for duplications of that over and over is worse.
(The console used to exist. It hasn’t since Jaguar chips were foisted onto the unwitting.)
mode_13h - Tuesday, August 24, 2021 - link
> ‘Consoles’ are a Trojan horse to artificially inflate the pricing of ‘PC gaming’ GPUs.If this were true, then PC gaming prices should've been inflated since the existence of modern consoles, not since just a couple years ago.
But, of course it's nonsense. You're just pouting like a child that you can't afford the gaming GPU you want, and casting about for a scapegoat to receive your seething rage.
All I can say about that is that it's not solving your problem. No matter how much you rant about consoles and AMD, you're still not going to have the gaming GPU you want any sooner. Maybe you should actually take some of TheJian's money-saving tips from a few threads ago. Either that, or find a better way to pass the time.
Oxford Guy - Thursday, September 2, 2021 - link
'You're just pouting like a child'That's rich coming from the poster who foists an ad hom on us in almost every response.
'No matter how much you rant about consoles and AMD, you're still not going to have the gaming GPU you want any sooner. ... Either that, or find a better way to pass the time.'
Another ad hom that completely misses the mark. I have stated, quite a few times, my reasons for making the effort to talk about the self-defeating mythology that surrounds PC gaming and 'consoles'.
You continue to try to turn everything into a referendum on my character. Ad hominem. Your debating skills are childish.
mode_13h - Tuesday, October 5, 2021 - link
> I have stated, quite a few times, my reasons for making the effort to talk> about the self-defeating mythology that surrounds PC gaming
Why not indulge us, one more time?
> You continue to try to turn everything into a referendum on my character.
You make it almost impossible not to, with your continual whining, off-topic thread-jacks, leftist propaganda, and anti-AMD conspiracy theories.
> Your debating skills are childish.
If yours were any good, maybe you'd win a few people over to your side. Calling everything a fallacy, as if such allegations are self-evident, self-supporting, and themselves infallible pretty soon just comes off more as laziness than even elitist tone to which you seem to aspire.
shabby - Thursday, August 19, 2021 - link
Watch intel give these away with every desktop at a discounted rate.Drumsticks - Thursday, August 19, 2021 - link
Arc on N6 is interesting for a couple of reasons. First, what the article notes, the extra density advantage that Intel will have. But on top of that, presumably, it means that the capacity dedicated to currently manufacturing AMD and Nvidia N7 GPUs won't be just competing with an additional manufacturer, because Intel is on a separate fab line, presumably we will actually get an increase in the total # of GPUs being put out per month, rather than just seeing the same capacity spread across three companies. If that's true, I wonder if the amount of GPUs AMD or Nvidia can produce on N6 relative to N7 might be a factor in them staying on N7 longer.Granted, of course, I'm sure all of this has been worked out by all four companies well in advance, but it's exciting times. Here's to a third successful manufacturer, and more competition! Now all we need is for Nvidia to somehow jump into the CPU game as a viable Desktop CPU maker, and we'll have a double triple market (not that this one is actually likely!)
Ryan Smith - Thursday, August 19, 2021 - link
"presumably, it means that the capacity dedicated to currently manufacturing AMD and Nvidia N7 GPUs won't be just competing with an additional manufacturer, because Intel is on a separate fab line"N6 makes significant use of TSMC's N7 infrastructure. So they should be considered shared lines for most matters.
dotjaz - Thursday, August 19, 2021 - link
N6 uses the same design rules not the same litho machines. You can't just inject and remove EUV machines from the same production process. N6 would have its own set of EUV and DUV scanners (initially) likely upgraded from the discontinued N7+ line.Drumsticks - Thursday, August 19, 2021 - link
Ah, drat. I guess on the possible bright side, is TSMC still bringing/converting additional capacity to 7(6)nm?mode_13h - Friday, August 20, 2021 - link
> Nvidia to somehow jump into the CPU game as a viable Desktop CPU makerIt's not quite the same thing, but Nvidia is working with MediaTek on gaming-capable ARM SoCs for laptops and potentially even SFF desktop machines.
https://nvidianews.nvidia.com/news/nvidia-and-part...
(search for mediatek)
BedfordTim - Thursday, August 19, 2021 - link
Everything I read suggests there is substantial brand loyalty/hatred in the GPU market, but maybe that is a vocal few.euskalzabe - Thursday, August 19, 2021 - link
Many people (stupidly) love to pick sides. With intel in the game/war, it'll become one more side to pick. Who cares, if we get more GPUs, and with intel needing to establish itself as a serious player, if we get better price/performance ratios, this is nothing but welcome news in our very stagnant GPU industry.mode_13h - Friday, August 20, 2021 - link
What gets people really animated is when they sense unfair dealing. Like if one GPU maker develops proprietary libraries tuned for their hardware, and incentives some game developers to use them.Another big thing source of conflict is when companies misrepresent their products, such as via benchmark-rigging.
> very stagnant GPU industry.
It's certainly not stagnant in terms of feature set or performance!
Oxford Guy - Thursday, September 2, 2021 - link
'It's certainly not stagnant in terms of feature set or performance!'Exactly. That Vega's IPC was identical to Fury X's and that AMD adopted a 'Polaris forever' mindset is the antithesis of stagnation.
mode_13h - Tuesday, October 5, 2021 - link
> That Vega's IPC was identical to Fury X'sIn a previous thread, I already enumerated the myriad ways in which Vega improved over Fury X. However, all of that is beside the point. AMD has moved beyond CGN... why haven't you?
mode_13h - Friday, August 20, 2021 - link
Show me where there *isn't* brand loyalty! I guess lack of brand-loyalty would be the definition of a true commodity.Pretty much every enthusiast-oriented PC product is subject to brand loyalty. Everything from cases to peripherals, cooling products, PSUs, motherboards, and obviously CPUs.
Oxford Guy - Monday, August 23, 2021 - link
‘Everything I read suggests there is substantial brand loyalty/hatred in the GPU market, but maybe that is a vocal few.’Astroturf as substitution for real competition.
mode_13h - Tuesday, August 24, 2021 - link
People are tribal, by nature. They're also often insecure about their purchases and try to validate their decisions. Brand-loyalty is certainly something the brands cater to, but it would exist even if they did nothing to foster it.Oxford Guy - Thursday, September 2, 2021 - link
Less Sociological supposition and more to-the-point observation.mode_13h - Tuesday, October 5, 2021 - link
> Less Sociological supposition and more to-the-point observation.If that's meant to be a request, why? And if there's a error in my statement, why not mount a counter-argument?
name99 - Thursday, August 19, 2021 - link
"Intel the second GPU manufacturer to start including a systolic array for dense matrix operations"Apple is a GPU "manufacturer" and includes a systolic array for dense matrix operations...
The obvious one we know of is on the CPUs (though who knows, maybe there are also matrix units on the GPU or NPU?) but placing it on the CPU may well be a better location than on the GPU...
bwj - Thursday, August 19, 2021 - link
Google also includes systolic array on their TPUs (and has since 2017).name99 - Thursday, August 19, 2021 - link
Now that I've had a chance to see the whole thing, I'm still not sure what Intel's game plan is.So we have XMX on Xe-HPG.
And AMX is apparently still a real thing.
So is the idea that (with the same mindset that made such a success of AVX-512) Intel plan to have TWO separate matrix ISA's.
One for (a small subset of) the plebs, attached to the discrete GPU via Xe-HPG.
And a second subset attached to (some subset of) Xeon's.
But nothing that is mass-market, attached to every Intel device going forward?
Oh well, we can fix it in post (ie via OneAPI)...
mode_13h - Friday, August 20, 2021 - link
> So is the idea that ... Intel plan to have TWO separate matrix ISA's.Yes, but their GPUs and CPUs have different ISAs anyhow. More to the point, you can't afford a round-trip for running XeSS on CPUs, even if consumer CPUs already had AMX (which they don't, nor will Alder Lake change that).
Also, the XMX seems to occupy more of Xe-HPG than AMX does of Sapphire Rapids. So, it should be more economical to scale deep learning performance by slotting in a bunch of Xe-HPG cards into a server, than by building a server with more Sapphire Rapids CPUs. This is the gameplan Nvidia has successfully implemented since all the way back in the Pascal era.
Also, Xe-HPG has hardware video decoders, which you need to scale up in proportion with the deep learning horsepower, if one intends to do video analysis. Sapphire Rapids has no hardware video decoder (as far as we know).
> But nothing that is mass-market, attached to every Intel device going forward?
Wait until next generation. My guess is these XMX engines will become a standard feature of all their iGPUs and dGPUs, going forward.
> Oh well, we can fix it in post (ie via OneAPI)...
It's better than nothing. It serves the purpose of enabling a large amount of software to be written which can benefit from hardware they haven't even yet released.
name99 - Friday, August 20, 2021 - link
It (ie presence of XMX or AMX) does that when it's part of a considered plan to build out an ecosystem. Not when it's a mad scramble to appear relevant, governed primarily by financial considerations (ie higher prices to get AMX or XMX).Or to put it differently, Intel is going to have to work hard to get that "large amount of software to be written" given the way they have handled AVX512. Yet every indication is that the exact same mindset as governed AVX512 is driving these extensions.
I don't want to be a grumpy gus; apart from anything else it's boring. But what I see here is a company that has forgotten that it sells chips because of an ECOSYSTEM, and that it therefore has to produce products based on the construction of an ecosystem. And if that means sometimes delaying a feature to get it right, or putting it on every product rather than charging for it, or ensuring uniformity across implementations, well, that's what being a responsible ecosystem steward means.
It's OK to make mistakes (TSX? AVX512?) It's not OK to learn nothing and change nothing after those mistakes.
Ryan Smith - Thursday, August 19, 2021 - link
Apple is not a discrete GPU manufacturer, which is where I was going with that. But as you are technically correct, the chair concedes the point, and I've updated the article text accordingly.mode_13h - Friday, August 20, 2021 - link
Did Intel call it a "systolic array", or is that how you imagine they implemented it?mode_13h - Friday, August 20, 2021 - link
> placing it on the CPU may well be a better location than on the GPUSapphire Rapids will include AMX, which is probably in the same ballpark.
kpb321 - Thursday, August 19, 2021 - link
The theoretical 8 slice tflops ends up being right around the 3060ti. So if that's the biggest GPU they have designed for HPG they are going to be hitting in the low to mid range part of the market and not competing with the 3080/3070 (or their successors) end of the market.thestryker - Friday, August 20, 2021 - link
RTX 3060Ti is theoretical max of 16.2 and RTX 3070 is 20.31 so this guess more splits the difference between the two. I hope it's closer to the 3070, but at the end of the day real world performance, price and availability will mean everything.bwhitty - Thursday, August 19, 2021 - link
What does having RT units in HPC/Ponte Veccio mean? Who’s going to be running DX12 ray tracing on the server?Some cloud gaming service? Or are RT units used for offline rendering too?
Rudde - Friday, August 20, 2021 - link
RT is for HPG, not HPCbwhitty - Friday, August 20, 2021 - link
That's not true. See https://www.youtube.com/watch?v=eieRXlaDuzo at about 4:40.So my question stands, what do you do with RT units in HPC?
mode_13h - Friday, August 20, 2021 - link
Assuming you're correct, we could look to Nvidia for an explanation. They provided RT-enabled workstation/server cards for the film and video production industry to put into render farms.Another use case is for cloud-based gaming, like Google Stadia and GeForce Now.
Still, I can't answer why they couldn't just use a bunch of server-oriented (e.g. passively-cooled) HPG cards for that.
Frenetic Pony - Thursday, August 19, 2021 - link
So much matrix math is a bit dumb, what are you going to use it for Intel, the only use right now or even in the foreseeable future is stuff like your upscaling. Which you added just so you can say you have AI, other upscalers match that quality and speed without AI.Ohwell, it's competition and more supply. So bring it on!
mode_13h - Friday, August 20, 2021 - link
Nvidia originally advertised it as being useful for global illumination, where a deep learning model can interpolate lighting intersections better than conventional methods.Of course, the other reason to have it there is that Intel probably intends to serve multiple markets with these cards, just like Nvidia does with their gaming cards. Turing and now Ampere gaming GPUs are mounted on passively-cooled cards with no video connectors and sold (with a big markup) as inferencing accelerators.
https://www.nvidia.com/content/dam/en-zz/solutions...
Lastly, if they can use a gaming GPU for deep learning classes, it gives university students an excuse to hit up their parents to fund an upgrade. This is an area where AMD's RDNA cards are losing out.
name99 - Friday, August 20, 2021 - link
Can you do interesting completely novel things with this? For example fluid mechanics?I could imagine that as an (unmentioned) driver of these matrices; the idea being that being able to simulate fluid mechanics well opens up a whole new regime in gaming.
mode_13h - Saturday, August 21, 2021 - link
> Can you do interesting completely novel things with this? For example fluid mechanics?No, probably not at such low-precision. Not for anything serious, anyway. AMX has the same issues. It's too closely-tailored to deep learning use cases.
> being able to simulate fluid mechanics well opens up a whole new regime in gaming.
I've seen game engine demos for years, showing a suitable approximation of CFD for entertainment purposes. It seems to be the case that dedicated matrix engines aren't even needed for that.
I take your point that matrices are a powerful mathematical tool, but the precision they implemented is pretty hard limitation to work around.
name99 - Sunday, August 22, 2021 - link
Is the high precision absolutely absent? Or just not boasted about in the marketing slides?For comparison Apple's AMX supports the full range of multiplies you'd expect, from 64b doubles through 32b floats to 16b half floats and 16b ints.
(Dougall did not find 8b int multiplies but that may be a limit in his reverse engineering techniques, not in the instruction set. Or Apple could add them if they become relevant.)
My point is it's interesting that Apple seem to have designed AMX very much as a generic matrix multiply accelerator (up to double precision) hence capable of accelerating PDE solutions, like I suggested.
It would be sad (though, unfortunately very much in character) if Intel instead crippled both their offerings by obsessing over serving one market, namely neural nets, at the expense of future possibilities.
mode_13h - Monday, August 23, 2021 - link
> Is the high precision absolutely absent?Since they only talk about fp16 and int8, I'm assuming so. Furthermore, I'll bet that "fp16" actually means BFloat16, since it takes even less silicon to implement than IEEE 754-2008 half-precision and is better-suited to deep learning.
The Tensor cores in Nvidia's Turing GPUs supported only fp16, int8, and int4. That will have been what Intel saw, while designing these first-gen XMX units.
> Apple's AMX supports the full range of multiplies you'd expect
That's cool. Is it built into their CPU or GPU, though?
The Tensor cores in Ampere GP102+ GPUs also now support BFloat16 and fp32.
https://www.nvidia.com/content/PDF/nvidia-ampere-g...
The "Matrix Cores", in AMD's CDNA support fp32, fp16, bf16, and int8.
https://www.amd.com/system/files/documents/amd-cdn...
> It would be sad (though, unfortunately very much in character) if Intel
> instead crippled both their offerings by obsessing over serving one market
Well, it's a gaming GPU and die space is expensive. I don't really see how it's sad they left off fp32 and fp64, if almost none of its target users need it. Their compute-oriented dies will probably be more general/capable.
I think you should save that moral indignation for the limitations of the AMX capabilities in the forthcoming Sapphire Rapids CPUs.
name99 - Monday, August 23, 2021 - link
Apple AMX is *nominally" built into the CPUs. What this means is that the AMX instructions are placed in the CPU instruction stream and look like any other AArch64 instruction. They are decoded like normal, and placed in the Load/Store unit queue. When they reference the rest of ARM64 (like using an integer register to form an address, to load a matrix register) everything behaves as you would expect.At the point of execution, however, it's unclear what EXACTLY happens, in part because Apple is so integrated.
One possibility is that AMX is literally part of the CPU just like eg the NEON units and registers. That's what it looks like from the outside.
A second possibility is that in fact the AMX units (execution and registers) sit up in the L2. Conceivably there could be all sorts of variants of this, like 4 AMX units and 4 sets of registers. Or one AMX unit and 4 sets of registers. Or one unit, one set of registers, and if two cores want to use AMX, the registers are transparently spilled two and from L2.
I have not seen any data to allow me to choose between all these options, but all of them are justifiable. At some point perhaps I'll get to writing AMX test code, but it's much lower priority than "mainstream" CPU probing.
Worth noting however is that AMX is apparently also present on the E cores (or at least present in the E core cluster).
Presumably, to the extent a generic matrix multiple engine is useful for GPU work, one will be in the GPU as well. But Apple have the luxury that they can place HW where it makes the most sense, not based on what they sell.
nV can't do that (has to be in the GPU)
AMD can't do that (hard for them to define a full set of new x86 matrix instructions and get them accepted)
Intel can't do that (wait, why not? Well, yes indeed, why not?...)
I think what we can say is that
- Apple sees value in a low precision multiply unit for the NPU
- Apple sees value in a high precision multiply unit for the CPU
- Apple may (who knows?) see value in an intermediate precision multiply unit for the GPU
- Intel sees value in a low precision multiply unit that they can sell as being relevant to NNs, but since they don't have a close to integrated story about how their NPU fits in with the CPU or GPU, they're putting that low precision multiply unit on the GPU and the NPU.
- If Intel sees value in a high precision multiply unit on the CPU, well, for the company that just effectively cancelled AVX512, they have a strange way of showing it.
...........................
The issue of "almost none of its target users need it" is again not purely technical so I'm not going to fight about it. But the parts that are tech adjacent are
- construction of an *ecosystem* depends on close to universal presence of a feature. And if Intel is in the business of selling features, not constructing ecosystems, well they've forgotten everything that gave them 50 yrs of business success.
- customers don't know what they want until you give it to them. And honestly die space is not expensive. What is expensive is continuing to operate with a 90's mentality about the cost of transistors even when transistors are so ridiculously cheap. Rebelling against this mentality ("we can't add everything that sounds good bcs silicon is expensive") is precisely why Apple is able to produce so much more desirable chips than ARM at perfectly reasonable costs and die sizes.
mode_13h - Tuesday, August 24, 2021 - link
> Apple AMX is *nominally" built into the CPUs.Okay, that's what I expected. It makes a lot more sense to have a full complement of precision-support in the CPU than the GPU. Even there, Intel is a lot more interested in pursuing specialized applications than doing what most generally useful.
> And if Intel is in the business of selling features, not constructing ecosystems,
> well they've forgotten everything that gave them 50 yrs of business success.
This is a good point. I wish you could tell it to the clowns at AMD who don't care about providing good GPU-compute support on their consumer graphics cards.
> honestly die space is not expensive.
It is at GPU-scale! That's why gaming GPUs provide only 8:1 or 16:1 fp32:fp64 support.
> Rebelling against this mentality is precisely why Apple is able to produce so much
> more desirable chips than ARM at perfectly reasonable costs and die sizes.
I don't agree with that. Apple's products sell at premium prices, and this not only funds their silicon manufacturing but also the considerable degree of engineering that goes into it.
And answer me this: how much fp64 do Apple's GPUs provide?
mode_13h - Saturday, August 21, 2021 - link
FWIW, AMD took a much more expansive view of the problem, in the Matrix cores they developed for the MI100 (Arcturus).Nvidia has also been generalizing their Tensor cores. So, maybe this first step from Intel is just that.
Gondalf - Saturday, August 21, 2021 - link
Ummmm, apparently in Pontevecchio the compute tiles are on TSMC 5nm, other tiles ar on vanilla 7nm from both TSMC and Intel.This time Intel have done things well in both consumer (TSMC 6nm) and server (compute on TSMC 5nm).