Comments for Hidden Secrets: Investigation Shows That NVIDIA GPUs Implement Tile Based Rasterization for Greater Efficiency

Hidden Secrets: Investigation Shows That NVIDIA GPUs Implement Tile Based Rasterization for Greater Efficiency

by Ryan Smith on 8/1/2016 5:00 AM EST

Posted in
GPUs
NVIDIA
Maxwell

Post Your Comment
Please log in or sign up to comment.

Comments Locked

191 Comments

Back to Article

jjj - Monday, August 1, 2016 - link
Makes the ARM Mali presentation at Hot Chips even more interesting as mobile and PC GPUs become more and more alike.
Michael Bay - Monday, August 1, 2016 - link
I thought nV went explicit about mobile-centric design right around Maxwell and cited it as a reason for power efficiency increase? Kind of like intel did a while earlier.
telemarker - Monday, August 1, 2016 - link
Link to how this is done on Mali (hotchips last year)
https://youtu.be/HAlbAj-iVbE?t=1h5m
Alexvrb - Monday, August 1, 2016 - link
PowerVR was doing it before it was cool. :D On mobile and desktop - had a Kyro and a Kyro II back in the day when they were kings of efficiency on a dime.
Strunf - Tuesday, August 2, 2016 - link
Yes, but as far as I remember they had some problems with overlapping objects, like wheels not showing up properly and so on. Probably fixable with drivers but I think they gave up on the desktop market within 1 or 2 years after their comeback.
Scali - Tuesday, August 2, 2016 - link
Those weren't hardware or driver problems, as mentioned elsewhere.
They were poorly designed software, which assumed that the zbuffer was 'just there', which is not the case on PowerVR hardware. PowerVR acts the same as a device with a full hardware buffer, between rendering passes. However, if you don't mark your rendering passes properly in your code, then you get undefined results.
TessellatedGuy - Monday, August 1, 2016 - link
This is why nvidia will always be miles ahead of amd in perf/watt and architecture
Remon - Monday, August 1, 2016 - link
Yes, the whole 2 years they've been more efficient than AMD will dictate the future...
milli - Monday, August 1, 2016 - link
I was thinking exactly the same.
People seem to forget that it was ATI that was ahead in performance and market share up until 10 years ago and it's just the past two years that nVidia has taken a serious lead in market share and perf/watt. Things change, especially in markets like these.
close - Monday, August 1, 2016 - link
His name is TessellatedGuy and we all know AMD doesn't do tessellation that well. This means that using AMD might make him look like crap... or something :).
looncraz - Monday, August 1, 2016 - link
AMD was well ahead of nVidia in tessellation at one point. And, today, they are basically even (technically far ahead if you consider clock rates).

Compare the RX 480 tessellation to the GTX 1060 tessellation:

AMD RX 480 Tessmark: 2240
NV GTX1060 Tessmark: 2269

And that RX480 is likely running at ~1.2Ghz, and the 1060 is running at ~1.8Ghz.
Scali - Monday, August 1, 2016 - link
Why would you look only at the clockspeed, the most meaningless spec of all?
Let's look at the rest:

RX480:
Die size: 230 mm²
Process: GloFo 14 nm FinFET
Transistor count: 5.7 billion
TFLOPS: 5.1
Memory bandwidth: 256 GB/s
Memory bus: 256-bit
Memory size: 4/8 GB
TDP: 150W

GTX1060:
Die size: 200 mm²
Process: TSMC 16 nm FinFET
Transistor count: 4.4 billion
TFLOPS: 3.8
Memory bandwidth: 192 GB/s
Memory bus: 192-bit
Memory size: 6 GB
TDP: 120W

Pretty sad that the RX480 can only reach GTX1060 speeds, with higher specs all around (the memory interface is actually the same spec as GTX1070).
gamervivek - Monday, August 1, 2016 - link
If you're the same Scali, then I can see why they banned you from beyond3d.
medi03 - Monday, August 1, 2016 - link
The only higher spec here is the die size.
For number of transistors: that's the trade off. AMD packs more of them into the same area, but then runs them at lower clock.

Mem bandwidth is likely irrelevant.
Scali - Monday, August 1, 2016 - link
"Mem bandwidth is likely irrelevant."

I wouldn't be too sure of that.
What Tessmark does is little more than displacement-mapping a simple sphere.
In a synthetic test like that, perhaps the bandwidth can affect the speed at which the displacementmap can be sampled, and therefore can have an effect on the total tessellation score.

It's an interesting area to explore by testing cards at different memory speeds, while leaving the GPU at the same clock.
TheJian - Tuesday, August 2, 2016 - link
You seem to be missing the point. Based on specs AMD should be leading this competition by >10%. Larger die, 1.3B extra transistors, 30 more watts, smaller process, 1.3TF extra, and far more bandwidth. Despite ALL of those specs leading on AMD's side, they LOSE to 1060 except in a few games they help fund to optimize (AOTS etc).

https://www.reddit.com/r/nvidia/comments/4tyco0/th...
240 benchmarks over 51 games in a nice spreadsheet if you wanted it :)

findings:

On average, a 1060 is 13.72% better than a 480.
On average, when using DX11, a 1060 is 15.25% better than a 480.
On average, when using DX12, a 1060 is 1.02% worse than a 480.
On average, when using Vulkan, a 1060 is 27.13% better than a 480.
AMD won in 11.36% of DX11 titles, 71.43% of DX12 titles, and 50% of Vulkan titles.
The most commonly reviewed games were AotS, Hitman, RotTR, the Division, GTA V, and the Witcher 3.

Toss in a Doom fix coming soon (bethesda faq says pascal not working yet for async, hardocp verifies this showing older 1.08 libraries used instead of 1.0.11 or 1.0.17 etc) and surely NV finally concentrating on DX12 at some point, and you should get the picture. It's not going to get better for AMD who quit dx11 ages ago and moved to dx12/vulkan purely due to lack of funding for dx11 at the same time (and a hope we'd move to windows 10 while free...LOL). It's comic to me that people think AMD will win based on the games you can count on your hands, while NV is concentrating on where 99% of the games and users are (dx11/OpenGL/win7/win8 etc).

AMD gave up DX11 when Nvidia beat them in their own Mantle game test StarSwarm. Tested here and shown winning even at anandtech once NV concentrated on showing up mantle with dx11. AMD revenue and R&D tanked for the last 3-4yrs in a row, so should be no surprise to anyone how this is working out while they spend what little R&D they do have on chasing consoles instead of CORE products/drivers like NV decided to do. You can be sure Vega/ZEN would both be here ages ago if they hadn't chase xbox1/ps4 and now their new replacements coming soon yet again. They are both late due to not being able to fund (money or people wise) consoles AND their core gpu/cpu/drivers depts at the same time.

ZEN better be at least as big as Xbox1 (363mm^2) or they can kiss a victory over Intel goodbye (and pricing power due to that win). If they want 58% margins again that they had back in 2006 that die better be huge and headed for HEDT territory from $400-1730 just like Intel. They are killing themselves chasing the bottom of everything. IE 480 instead of 1070-1080-titan's (which together sell as many as 480 or more units), and console single digit margins (said AMD early on, only mid teens now which means <15 or you'd say 15%) vs. mass margins on high end CPU's. When they chose xbox1 dev which likely started >6yrs ago, they delayed ZEN plans. We all love great pricing, but AMD needs great PROFITS and MARGINS.

I will gladly buy a ZEN and pay a premium if it WINS and same with Vega, but I will ignore both if they don't win. Bargain crap means nothing to me as I expect to buy the winner in both categories at over $300 next (considering Titan due to content creation aspirations, and HEDT Intel). Produce a winner or end up bargain bin like today with no cpu over $153 on amazon. You should be in the $400-7000 market like Intel! There is no profit in Bang for buck crap with when you have to pay $180mil just to cover interest. You need MARGINS.

Not quite sure why you can't see all the higher specs and AMD failing here. They should be winning everything with 480 judging all the specs. WTH are you looking at? If you're running ~20% more watts, tflops, die size, transistors etc and not winning everything YOU have failed. At worst an Nvidia paid title should match you, you should not lose.
wumpus - Wednesday, August 3, 2016 - link
The entire point of this optimization is almost certainly to reduce the bandwidth load of the main memory. Nvidia found a way to make certain that nearly every single write to the display buffer hit cache. Looks like memory bandwith is certainly relevant (although we can't be sure if it was done for power consumption or pure pushing the pixels).
haukionkannel - Monday, August 1, 2016 - link
MAD use more complex system that is more suitable to different kind of work and that is why it is so good in DX12 and Vulcan. Nvidia has very "simple" architecture that is tailored arounf DX11, and it is super efficient in that environment. But it loses its edge when moving more complex environments, because it is not so flexible. That is why AMD has very high compute power vs. draw power compared to Nvidia. Nvidia new 100 chip is more compute oriantated, but for graphics it is not faster than 102 is. But it may be more suitable in DX12 environment.
So Nvidas next generation may not be so efficient than what they have today. The interesting thig is how close they can get. And in DX12 tasks they will see good jump compared to Pascall. But that is completely another matter.
But you can not compare different architectures easily, because they are good at different tasks.
Scali - Monday, August 1, 2016 - link
Poppycock.
silverblue - Tuesday, August 2, 2016 - link
I think the issue for AMD is that they were spooked somewhat by NVIDIA's brute force focus on compute with Fermi to the point that it took them two generations to fully do the same, at which point Kepler's reduced focus caught them off guard again, however they never really had the same tessellation performance until now. Polaris 10 is obviously more dense than GP106 and is definitely more rounded but pays for it in power consumption, additionally the RX480 is clocked slightly above the sweet spot for Polaris 10 which may account for a bit of the power consumption.

I think the question here is which implementation looks to be more forward thinking rather than who has the best specs or the best power consumption. The 1060 is slightly quicker at the moment but not by NVIDIA's handpicked percentages.

TBR is a major performance boost that facilitates cheaper hardware, but whilst it's not a universal performance booster, it's better to have than not. It's very welcome back in the desktop space.
piroroadkill - Monday, August 1, 2016 - link
It's funny, because ATI practically introduced tessellation to the consumer graphics card, but no games developers took advantage of it at the time, so work on it was stopped.

Look up TruForm.
extide - Monday, August 1, 2016 - link
Yeah, I think the original Radeon (R100) had the earliest Implementation of TruForm, ATI's tessalator
Alexvrb - Monday, August 1, 2016 - link
Very true. They were ahead of the times... and it bit them in the behind. Oh well. They're doing well enough I think with the 480, puts them on solid footing for the foreseeable future.
mr_tawan - Tuesday, August 2, 2016 - link
ATi were bad with developer relationship. That's the reason why no one made use of ATi shiny tech (it's not only TruForm that suffer from this). AMD now tries to recover from that by open souce their library and stuffs. I am excited to see more from them.

Game coding is hard, and if you're throwing new tech into it, it'd become much harder. That's one of the reason why game developers stick with things tried and true (AFAIK many game's codebase is still C99 or even C89). If they want those devs to move, they have to lobby those dev some way or another. That's what Nvidia doing very well for a long time. Having only shiny tech demo does not do any excitement to those dev....
Scali - Tuesday, August 2, 2016 - link
"ATi were bad with developer relationship. That's the reason why no one made use of ATi shiny tech (it's not only TruForm that suffer from this)."

TruForm was a standard feature in the D3D API: N-patches. As were the GeForce 3's competing RT-patches, of which about as little was heard as from TruForm, perhaps even less.
See: https://msdn.microsoft.com/en-us/library/windows/d...
The reason is: both technologies were extremely limited, and it was difficult enough to design game content in a way that they looked good with N-patches or RT-patches. Even more difficult to design game content that looked good with both N, RT and no patches. Because the market your games target will contain an combination of all three types of hardware.
wumpus - Tuesday, August 2, 2016 - link
It isn't just that, its market share as well (although if you are talking about Mantle at least all modern consoles are driven by Mantle-derived systems). If the market is overwhelmingly nvidia, you don't program for AMD.
Scali - Tuesday, August 2, 2016 - link
"all modern consoles are driven by Mantle-derived systems"

The opposite actually: both Microsoft and Sony developed their own APIs with low-level features, long before Mantle arrived.
KillBoY_UK - Monday, August 1, 2016 - link
and when the tables turned AMD fans suddenly said they didn't care about PPW and general power useage lol
Mr.AMD - Monday, August 1, 2016 - link
I truly din't, and you NV fan could also not care.....
Because all GPU's use more power by OCing them, NVIDIA cards also used more than 300Watt in FULL OC. Not that i care, true performance lay in stock cards. If stock performance is good why the need for OC? Cards become more power hungry, run hotter, shorter life....
Budburnicus - Tuesday, May 2, 2017 - link
"Not that i care, true performance lay in stock cards. If stock performance is good why the need for OC? Cards become more power hungry, run hotter, shorter life...."
LMAO OC does NOT shorten life significantly for any card. Only stupid noobs say that shit. Because A) Whatever amount your OC does shorten the lifespan by is insignificant! I still have a GTX 460 in my basement that I had OCed and running for 6 years in one PC or another, similar deal with a GTX 560 Ti - both of which are MASSIVELY hot and power hungry chips, and even with ~10%+ OC running on both of them for YEARS and they both still run at those same OCs. Point being - would YOU want to still be using a GTX 460 or 560 Ti today?? Because I sure as hell would not! Even with my CURRENT G1 Gaming GTX 980 Ti at 1505 core 8 GHZ VRAM (8.5 TFLOPS, 145 Gpixel, and 265 Gtexel/sec at this frequency which it runs 100% of the time under full graphics load) - this GPU will last me FAR LONGER than I will ever need it to! By the time it is likely to have ANY FAILURE it will be nearly obsolete! and B) If you are SMART ABOUT OVERCLOCKING then you will KNOW the safe operating temperatures of you GPU and all of its Components! As long as you are NEVER going higher than the approximately 85 C upper safe range for ANY GTX 900 series GPU - you are not doing ANY SIGNIFICANT DAMAGE!

As for ''Not that i care, true performance lay in stock cards.'' BAHAHAHAHAHAH what are you smoking m8? Can I have some?

980 Ti Stock = 6 TFLOP at reference boost of just 1076, with nearly exactly 96 Gpixel/sec, 176 Gtexel/sec and just 336 GB/sec VRAM

G1 Gaming 980 Ti @ 1505 core and 8 GHZ VRAM = 8.476 TFLOPs 144.5 Gpixel/sec 264.9 Gtexel/sec 384.4 GB/sec!

ALL while it NEVER gets hotter than just 67 C! Thermocouples showed not more than 78 C on VRM, not more than 70 on VRAM, and never more than 80 C at any single heat point on the ENITRE card! Now you probably don't know ANY of this, but the safe temps for the VRM FETs used on the G1 Gaming are in excess of 125 C - and this is rather standard for FETs.

Even WITH a 475 watt system power draw under gaming, the cost per hour of my PC is minuscule in my state.

Yes my GPU sucks A TON of power when I am GAMING! But when just watching Netflix, surfing etc, my ENTIRE PC never draws more than 110 watts at the wall. Under a HEAVY load like Prime95 and Furmark that does jump as high as 830 watts, but in gaming it never exceeds about 475 watts total system power draw. And when you take into consideration the massive OC on my i7-3770k at 4.7 GHZ, the 4 2.5 SSDs (3x RAID O) 1 4 TB HDD, 8 Case Fans and 2 CPU fans, as well as MANY USB peripherals all of this is rather normal.

Also, I have an i7-2600k that is and has been running at 4.5 GHZ since the DAY it was bought on RELEASE month, Jan 2011 - it still runs flawlessly. And your argument TOTALLY DIES with the fact that Pentium 4 CPUs still work today!

As long as heat and power are REASONABLE - you are NOT DEGRADING THE LIFESPAN OF ANY PC PART BY ANY SIGNIFICANT AMOUNT OF TIME! The Part you are OCing will be LONG old and nearly obsolete before the effects of Overclocking take ANY TOLL!

AND electronics just sometimes break. But modern parts will NOT BREAK from excessive heat or power, they will shut down before that becomes an issue. You can download MSI Afterburner, and just MAX out EVERY SLIDER - it will NOT KILL your GPU unless it had a weak or failing component to begin with - but what it WILL do is shut down the MOMENT you put it under stress! Same goes for CPU frequencies and BIOS OC settings!
TessellatedGuy - Monday, August 1, 2016 - link
Lol true. Electricity bills would make them care afterwards tho.
bigboxes - Tuesday, August 2, 2016 - link
Nonsense. I currently have a GTX 970 in my main rig that I run 24/7. I could care less about power usage. I have four computers running 24/7. My biggest power draw is my HVAC, refrigerator, dryer and water heater. Video card is way down the list. All I care about is performance/$$. I'm looking to go 4k in 2017. I'll evaluate everything after Vega is released.
Remon - Monday, August 1, 2016 - link
It was the other way around. Suddenly with Maxwell, efficiency was important...
Alexvrb - Monday, August 1, 2016 - link
Yeah I was just gonna say I hardly saw any ATI/AMD guys who were really diehards over power efficiency. Now Nvidia guys, they just didn't talk about it until it became a good talking point for their team. Meanwhile I use cards from both vendors from time to time, but price/performance is the biggest factor. Electricity is fairly cheap and the GPU is idle most of the time.
Chaser - Tuesday, August 2, 2016 - link
Oh please. Nvidia has both a performance edge AND an efficiency edge.
wumpus - Tuesday, August 2, 2016 - link
AMD fans wanted performance and talked about performance (for the price. And AMD typically delivers: it took a 1060 to really beat the 390 at its own game, and now has to contend with the 480. Just don't expect the high end to make much sense). AMD marketing pushed PPW, especially leading up to the 480 launch and even during. Pretty embarrassing to have use the same power as the 1080.
Scali - Monday, August 1, 2016 - link
Well, NVidia has been ahead in performance and market share quite a few times.
In fact, prior to the Radeon 8500/9700-era, ATi never made any cards that were very competitive in terms of performance. Back then, NVidia ruled the roost with the TNT and later the GeForce.
And when the GeForce 8800 came out, ATi again a few years with not much of an answer.
NVidia has been ahead of the pack more often and for longer periods than anyone else in the industry.
JoeyJoJo123 - Monday, August 1, 2016 - link
Nice shilling, boy.

Do you recall the GTX 480 disaster? Woodscrew and house fire memes all around. At that time the HD5000 series, particularly the 5850, were _THE_ midrange cards to get in that day and age, and were even further popularized by the red team's better performance during bitcoin mining.

In either both companies suck and should be providing better products for their customers.
Scali - Monday, August 1, 2016 - link
"Do you recall the GTX 480 disaster? Woodscrew and house fire memes all around."

What's that have to do with it? It still had the performance crown.
It's obvious who the shill is here.
qap - Monday, August 1, 2016 - link
It's funny someone calls GTX 480 "disaster". It was not a good card, but at least it was the fastest card on market.
Last few years AMD sells cards with comparable power consumption and acoustic performance that are not even close to the top (basically anything based on Hawai/Grenada) and noone labels them that way.
Scali - Monday, August 1, 2016 - link
Must be something in that '480' name... The RX480 was also a disaster, breaking PCI-e power limits.
Alexvrb - Monday, August 1, 2016 - link
People break the PCI-E limit all the time. It was never something people freaked out about. Anyway they released the fix in like a week. If it wasn't something they could fix, then they'd have been in trouble and would have had to offer a recall. With that said they needed to get their eyes drawn to the issue so they could correct it, but it was hardly a disaster. More like a footnote. These days I get aftermarket OC models and sometimes OC them some more. Probably breaking spec in the process.
StrangerGuy - Tuesday, August 2, 2016 - link
Nobody could care less if specs are broken by OCing but It's always funny when somebody can claim anything like "breaking PCI-E limit all the time" as gospel when no card at stock, custom or not, so far has been proven to do that except the unfixed reference RX480.
Scali - Tuesday, August 2, 2016 - link
"With that said they needed to get their eyes drawn to the issue so they could correct it"

Wait, you're saying that a company such as AMD doesn't actually test the powerdraw on its parts in-house, before putting them on the market, and they need to rely on reviewers to do it for them, to point out they're going out-of-spec on the PCI-e slot, because there's no way AMD could have tested this themselves and found that out?

Heh. What I think happened is this: AMD tested their cards, as any manufacturer does... They pushed the card to the power limits, and slightly beyond, to eek as much performance out of it as possible, because they need everything they can get in this race with NVidia. They then said: "Nobody will notice", and put it on the market like that.
Problem is, people did notice.
silverblue - Tuesday, August 2, 2016 - link
Sadly, much like the GTX 970's memory clocks.
Alexvrb - Wednesday, August 3, 2016 - link
I didn't say they weren't aware. I said they needed their eyes drawn to the issue = needed their attention focused on it. Kind of like Nvidia and the 3.5GB debacle. They KNEW all along. The difference is one was fixed in a week with a driver (thanks to all the negative attention) and one is permanent. Yet the same people slamming AMD for their screwup here would gladly defend Nvidia's slip.

The reason I brought up OCing is that I'm trying to point out that it is done all the time and it's not as big of a deal as the frothing anti-AMD boys would have you believe. Oh and there HAVE been custom models that broke PCI-E limits as configured by the manufacturer. Nobody batted an eye. Many users broke the limit even further by bumping clocks further.
Ro_Ja - Wednesday, August 10, 2016 - link
With the GTX 750 Ti also doing it, I'm not sure why you're blabbing this RX480 power limit disaster. Almost any GPUs will do these nowdays.
fanofanand - Tuesday, August 2, 2016 - link
As someone who went from a 512MB 8800 GTS to an HD5850, I concur. ATI won that round and that's why I switched.
KillBoY_UK - Monday, August 1, 2016 - link
More like 4, Since kepler they have had better PPW, just with maxwell the gap got alot wider
TessellatedGuy - Monday, August 1, 2016 - link
Oh shit I've offended fanboys by saying the truth. Or maybe saying good things about nvidia isn't allowed because of their bad practices. C'mon, are you gonna be swayed to not buy an objectively better product just because the company is bad?
close - Monday, August 1, 2016 - link
Yes, you will definitely get a prize for *LIKING* a specific product. A fanboy is a fanboy no matter what he cheers for ;). And usually fanboys on any side aren't that bright to begin with.
TessellatedGuy - Monday, August 1, 2016 - link
Also, looking at how the 14nm "efficiency" worked out for amd, I wouldn't be surprised if I was correct. Hell, a bigger die size like pascal is more efficient that polaris.
godrilla - Monday, August 1, 2016 - link
The nano was pretty impressive, and it seems amd will be 1st with hbm 2.0 consumer card, plus performance per watt is increasing with age do low level APIs vs decreasing due to legendary generation graphics status!
hojnikb - Monday, August 1, 2016 - link
If nvidia ever decided to underclock the shit out of something like titan x, they could just as easily reach some next level efficiency.
emn13 - Monday, August 1, 2016 - link
Judging by forum reports, AMD's latest cards are at least as unreasonably clocked. Many people seem to be able to get large savings with almost no performance loss with just a little undervolting/underclocking.

It's weird - their reputation as less efficient seems like something they'd be glad to lose, yet steps that are so trivial anyone can do them at home aren't taken.
looncraz - Monday, August 1, 2016 - link
AMD's easiest fix for their power usage would be to step the memory frequency with the core frequency. This would be most noticeable while watching videos, as AMD's current drivers just ramp the memory frequency to max clocks any time there is a GPU load.

I underclock my RAM to just 625Mhz (it's stable at 550Mhz, but I like a little margin) and save ~25~40W while watching Youtube videos. That means my R9 290 is only pulling some 20W or so to watch a video.
retrospooty - Monday, August 1, 2016 - link
AMD/ATI and Nvidia have traded that top spot many times in the past 15 years. Nvidia has had it for the past few years, but that can always change back again (again).
sharath.naik - Monday, August 1, 2016 - link
Not for the next few years, their polaris GPU turned out to be even more inefficient.
Yojimbo - Monday, August 1, 2016 - link
ATI had a bigger market share than NVIDIA just once, for about a year in 2H '04 and 1H '05. The peak was 56% ATI and 42% NVIDIA. AMD never has. Sometimes AMD/ATI has had the higher-performing top card, if that's what you mean, but the original comment was about the overall architecture. In the last two years NVIDIA has opened up a huge lead, with a peak of about 80% NVIDIA and 20% AMD, now down to 77% NVIDIA, 23% AMD.
StrangerGuy - Monday, August 1, 2016 - link
How the Radeon 4xxx to 7xxx hit a home run in perf/price yet got punished with less marketshare in the same time period versus Nvidia only speak volumes about how toxic the AMD brand in the minds of the consumers are; ATI as a brand has much favorable mindshare and it was a stupendously dumb decision that got dropped in favor of AMD.
Yojimbo - Tuesday, August 2, 2016 - link
Ehh, I could be wrong but from what I remember AMD had driver issues at the time. NVIDIA has just consistently had a more complete and polished product lineup. Certain cards or generations may have turned out better than others for AMD, but you're gonna be less likely to buy a product when the previous generation had problems X and Y. A company must execute consistently to maintain enough momentum to win market share in gaming graphics because the brands are very well known by the buyers.. In that sense it is a sort of branding issue. But it's not a marketing issue, unlike what a lot of people tend to imply when they disdainfully dismiss it as a branding issue.
wumpus - Wednesday, August 3, 2016 - link
I'm pretty sure Intel's slice (volume, not revenue) is bigger than even nVidia's. AMD is getting squeezed on both sides, a steadily decreasing chunk of a steadily decreasing market.
Mr.AMD - Monday, August 1, 2016 - link
That is a nice assumption, but very wrong.
AMD will show true performance in the higher levels by releasing Vega 10 and 11 GPU's.
Performance/Watt/Dollar is going to be almost perfect for AMD, rumor goes AMD will PAPER launch Vega in October. I truly can't wait for Vega, because the performance will be very high on this 16Nm FF node. Better OC, better stock performance, better every thing.
TessellatedGuy - Monday, August 1, 2016 - link
future technology is obviously gonna be better. And who knows, polaris efficiency was a big fail in efficiency. Could happen to vega as well. You made a nice assumption as well, but very wrong too.
looncraz - Monday, August 1, 2016 - link
Polaris efficiency is hampered by 8 inefficient GDDR5 controllers. The 110W GPU TDP would be closer to 90W without GDDR5.

The RX480 VRM uses about 20W, and the RAM uses about 30W.

With HBM2 the GPU TDP would be 95W, RAM would use 10W, and the VRM could use 15W.

That would be 120W total, vs 160W.
Scali - Monday, August 1, 2016 - link
"Polaris efficiency is hampered by 8 inefficient GDDR5 controllers."

GTX1070 has the exact same memory, same 150W TDP, yet delivers a lot more performance.
AMD is clearly doing something wrong in terms of efficiency. They can move to HBM2, but so can nVidia.
JeffFlanagan - Monday, August 1, 2016 - link
Your post seemed really odd, like propaganda really. Then I noticed your user name.
Chaser - Tuesday, August 2, 2016 - link
What color are those glasses?
StrangerGuy - Tuesday, August 2, 2016 - link
Yeah I'm sure every AMD fanboy were saying the same thing for Bulldozer back in 2011 too.
Yojimbo - Tuesday, August 2, 2016 - link
My guess is Vega will just be the Polaris architecture with more shaders. It needs the memory bandwidth of HBM whereas Pascal does not, because Pascal is a more memory bandwidth efficient architecture than Polaris. It'll have better efficiency than Polaris because of the HBM 2, but not nearly as good as Pascal because although the memory subsystem on Vega will be using less power than that of Pascal, the rest of the GPU will be using a lot more. They'll probably water cool it again so they can get the thermals necessary to run the card fast enough while staying under a reasonable TDP. In that case it will have similar issues to the Fury line. AMD needs Navi in a bad way.
Jedi2155 - Monday, August 1, 2016 - link
The funny thing is that this same technology was used to whoop the GeForce 2 GTS back in the day :)

I recall the Hercules Prophet 3D 4500 was ~$150 versus the GeForce 2 GTS was ~$250 yet got spanked by the tile based rendering.
http://www.anandtech.com/show/735/10
Alexvrb - Monday, August 1, 2016 - link
I had one. It was a great card for the money. You needed a decent CPU to go with it. Unfortunately they fell behind and had to withdraw. It's a shame as the 5500 had hardware T&L and DDR, along with a wider design and higher clocks. It basically nearly tripled the Kyro II / 4500 in terms of fillrate and memory bandwidth. There were rumors of DX8 support and 128MB variants. But desktop graphics was a tough market to break into even back then.

Interesting tidbit, they developed drivers with fairly efficient software T&L for the 4800 (Kyro II SE) which was delayed and eventually cancelled. However they did release said drivers for free to users of existing models as a nice going away present.
Cygni - Tuesday, August 2, 2016 - link
I believe quite a few Kyro II SE boards still exist and pop up in collector circles occasionally and are actually functional. Gotta be up there with the various Voodoo 5 6000 dev boards as far as collectible cards go, due to the fact that they actually work.

I remember reading the Kyro II review on this very website and thinking that tile based was the future. Well, its 15 years later, and Nvidia has finally made the move. Intel has been on board for a long time, as well as all the cellphone and mobile designs, so I guess all thats really left is AMD/ATI.
asendra - Monday, August 1, 2016 - link
Well, this explains how Kepler cards have been tanking so much in performance lately.
They obviously weren't being optimised specifically for, but now it seems they weren't even getting any of the more "general" optimisations.
Scali - Monday, August 1, 2016 - link
What are you talking about? This article is about how the *hardware* in Maxwell differs. You can't expect NVidia to optimize the hardware of Kepler, which is already in the hands of customers, can you?
asendra - Monday, August 1, 2016 - link
mm, drivers? I think that was clear enough. Drivers are in constant flux, with general optimizations and specific optimizations for new games.
Lately, kepler cards have being losing relative performance, and AMD cards that were trailing Kepler when released, have being performing much better in newer games.

Also the gap between kepler and maxwell cards have kept increasing.
Scali - Monday, August 1, 2016 - link
Yes, I know you are talking about drivers. That's exactly my point: what do drivers have to do with this article?
Yojimbo - Tuesday, August 2, 2016 - link
http://www.hardwarecanucks.com/forum/hardware-canu...

Kepler performance isn't tanking.
milli - Monday, August 1, 2016 - link
So PowerVR was right all along? No surprise there.
How Kyro, a 12m transistor chip, with less than half the bandwidth of the GeForce 256 DDR, a 23m transistor chip, could get so close to it, should have made the company more successful. Not too lucky.
jabber - Monday, August 1, 2016 - link
I've been pushing and holding a candle for tile based rendering since I had a PowerVR M3D card back around 1998. I remember MS stating big time when they were found to be quite a bit down on the power compared to the PS4 before release, that the Xbox One would use tile rendering. Seems that didn't work out...like the power of the cloud/off site processing making the One many times more powerful.
Yojimbo - Monday, August 1, 2016 - link
On Wikipedia it says that PowerVR uses tile-based deferred rendering. NVIDIA is using tile-based immediate rendering, which Wikipedia says Mali and Adreno also use.

From what I read there were previously image quality issues with tile-based solutions on the desktop before.
Alexvrb - Monday, August 1, 2016 - link
Well there's different tile based rendering implementations. PVR's tile based differed rendering was the original on the desktop. I can't recall any IQ problems with their TBDR cards, and I owned two of them. In fact they produced really superb quality. They did have some limitations, but it wasn't due to the TBDR design itself. Anyway Nvidia is using immediate TBR instead of deferred. They each have their advantages but immediate is the safer/easier path in terms of compatibility and design. It also theoretically allows them to have a GPU that can use TBIR OR conventional IMR. Which may be something they could decide on a per-game basis.
Alexvrb - Monday, August 1, 2016 - link
Deferred* ack. Anyway has anyone looked at the Series7XT Plus yet? I hope that gets more design wins... they nearly doubled the FLOPS vs the 7XT per cluster. If someone built a 16 cluster variant (a GT7900 Plus) for large tablet/hybrid use it could put out ~2TFLOPs at a 1Ghz clock. I'm no fan of Apple but an iPad Pro 2 with 12+ clusters of the Plus would be interesting from a technical standpoint.
StrangerGuy - Monday, August 1, 2016 - link
Computing history is filled with less performing products as market victors.

So your point is...?
nagi603 - Monday, August 1, 2016 - link
Didn't Carmac say that tile-based solutions are the worst you could do in terms of VR?
BrokenCrayons - Monday, August 1, 2016 - link
Possibly, but he's basically been Facebook's spokesperson for the Rift since he left Bethesda after accepting the golden toilet seat that only a company like Zuckerberg's could realistically offer someone who hasn't been doing anything in the industry for years aside from riding on the fame of a few decades old games. Oddly enough, NV's Maxwell cards are standard bearers for VR capability in spite of AMD's efforts to promote their last generation products as VR bliss.

However, given both of the GPU companies aren't really trumpeting VR with this new generation of cards and the game releases thus far are essentially casual and indie titles (including releases like IKEA's kitchen designer) I think VR's current resurrection has already run its course and is losing a lot of momentum.

Anyway, my point is not to take the words of a single, barely relevant has-been programmer as gospel. If anything, that guy is hardly worth the worship these days.
Michael Bay - Monday, August 1, 2016 - link
>hasn't been doing anything in the industry for years

You mistyped Sweeney.
BrokenCrayons - Monday, August 1, 2016 - link
Sweeney...now there's a name I haven't seen in a good decade, maybe more. When I was running a little mom n' pop computer store in the late 90's and early 00's there were a couple of college student customers of ours that idolized that guy. They sunk a lot of money into our store, so their fervor was something I could overlook while listening to the cash drawer's bell.
kn00tcn - Monday, August 1, 2016 - link
you didnt hear him last week talking about MS trying to kill win32?
Alexvrb - Monday, August 1, 2016 - link
Yeah I was just gonna say he has been very active... he's been whining, crying, pissing, AND moaning. For years. About the Windows Store and related subjects mostly. He just can not stand the thought of not having a near-monopoly on PC game downloads. I use Steam all the time but jeez man give it a rest. Apple and Google have stores, it's kind of required these days.

Plus it's a great way to get family to download apps safely on their fancy touch laptops and tablets. "Just get it from the Store. Yes the little shopping bag icon. Type in Netflix."
kn00tcn - Monday, August 1, 2016 - link
he was with oculus quite a while before facebook... & it doesnt have to be gospel, just a consideration that motivates people to make experiments, benchmarks, code changes
wumpus - Monday, August 1, 2016 - link
Michael Abrash has been insisting that "latency is everything" in VR, even before he joined Oculus. http://blogs.valvesoftware.com/abrash/ for plenty about graphics and VR.

Using tiling is going to add nearly a frame's worth of latency to VR (well, to everything. But nobody cares for non-VR issues). If you need to collect triangles and sort them into tiles, you are going to have wildly greater latency than if you just draw the triangles as they are called in the API. Vulcan/directX12 should help, but only if you are willing to give up a lot of the benefits of tiling.
Scali - Tuesday, August 2, 2016 - link
"Using tiling is going to add nearly a frame's worth of latency to VR"

Firstly, no... I think you are confusing deferred rendering with tiling.
Tiling is just the process of cutting up triangles into a tile-grid, and then you can process the tiles independently (even in parallel). This doesn't have to be buffered, it can be done on-the-fly.

Secondly, latency and frame rendering time are not the same thing. Just because you have to collect triangles first doesn't mean it takes as long as it would to render them.
It may take *some* time to collect them, there's a tiny bit of extra overhead there (depending also on how much of the process is hardware/cache-assisted). However, it allows you to then render the triangles more efficiently, so in various cases (especially with a lot of overdraw) you might actually complete the tiling+rendering faster than you would render the whole thing immediately. Which actually LOWERS your latency.

Also, I don't quite see how Vulkan/DX12 would 'help', or how you would have to 'give up a lot of the benefits of tiling' by using these APIs.
In terms of rendering, they are still doing exactly the same thing: they pass a set of shaders, textures and batches of triangles to the rendering pipeline.
Nothing changes there. The difference is in how the data is prepared before it is handed off to the driver.

A lot of people seem to think DX12/Vulkan are some kind of 'magic' new API, which somehow does things completely differently, and requires new GPUs 'designed for DX12/Vulkan' to benefit.
Which is all... poppycock.
The main things that are different are:
1) Lower-level access to the driver/GPU, so resource-management and synchronization is now done explicitly in the application, rather than implicitly in the driver.
2) Multiple command queues allow you to run graphics and compute tasks asynchronously/concurrently/in parallel.

In terms of rasterization, texturing, shading etc, nothing changed. So nothing that would affect your choice of rasterizing, be that immediate, tiled, deferred, or whatever variation you can think of.
wumpus - Tuesday, August 2, 2016 - link
The whole point about buffering is that it *will* take a full frame. And while you can do it in parallel, it won't speed anything up (you just draw in smaller tiles).

A simpler argument against my brainfart is that you don't want to display the thing until you are done. So it doesn't matter what order you do it.

The other thing is that eventually this type of thing can seriously improve latency (especially in VR headsets). What nvidia needs to do is twofold:

1: create some sort of G-sync 2.0 (or G-sync VR, but I'm sure fanboys will run out and buy g-sync 2.0 200Hz displays or something). This should let them display each *line* as it appears, not just each frame. This will of course be mostly fake, since neither device really works on the line level, but the idea is to get them both in sync up to about quarter screens or so. Drawing the screen 1/4 of the time will reduce latency by whatever time it takes to gather up the API calls and arrange them for rasterization + 1/4 of a frame (or however large the tiles are. Pretty darn small for high antialiasing and HDR).
2. Assuming that "gathering up the API calls" takes roughly as much as the rastering (it should if they don't share hardware, otherwise they are wasting transistors), than get the engines to break the screen into horizontal strips and send the API calls separately. This should be easy at the high level (just change the vertical resolution and render 4-8 times while "looking" further downward, but likely a royal pain at the low level getting rid of all the "out of sight, out of mind" assumptions about memory allocation and caching. But it buys you absolutely nothing if you can't "race the beam" and I doubt you can do that with current hardware with or without G-sync/free-sync (I can't believe they needed g-sync in the first place).

It might take awhile, but I suspect that the 3rd of 4th generation of VR will absolutely depend on this tech.
Scali - Tuesday, August 2, 2016 - link
"The whole point about buffering is that it *will* take a full frame."

Firstly, not in terms of time. Buffering the draw calls for a frame is faster than executing the draw calls for a frame.
Secondly, you are assuming that they buffer a whole frame at a time. But you can buffer arbitrarily large or small subsections of frames, and render out intermediate results (even PowerVR does that, you have to, in order to handle arbitrarily complex frames).

"And while you can do it in parallel, it won't speed anything up (you just draw in smaller tiles)."

Actually, it can. If you design the hardware for it, you can split up your triangle into multiple tiles, and then have multiple rasterizers work on individual tiles in parallel. Which would not be too dissimilar to how nVidia currently handles tessellation and simultaneous multi-projection with their PolyMorph-architecture.
wumpus - Tuesday, August 2, 2016 - link
Great post, but nothing like what I meant.

The reason for assuming a latency hit was that there are two parts to rendering a chip when tiling: breaking down all the APIs into triangles to figure out which go with which tile, and then running the tiles (this gets a little weird in that since it is technically not a "deferred renderer" it has to fake that it isn't exactly doing this).

The only reason you wouldn't take a latency hit is if the bit that sorts API calls into tiles can't operate while the rasterizer is rendering each tile. Obviously some API calls will stop things dead (anything that needs previously calculated pixels), but I suspect the sorter can simply mark it and keep sorting tiles.

It still shouldn't be much of a hit (even if the API never asks for something that hasn't been tiled yet) simply because sorting the tiles should be a quick process, much faster than rasterizing them. Basically it should give at absolute worst case one frame times the "length of time to sort into tiles"/"length of time to draw the screen".

DX12/Vulcan isn't an issue (unless you are playing games creating textures that aren't being directly used), but that likely means writing out some on-chip memory to DRAM and creating the texture when needed, and writing it back to on-chip memory.). It simply is an issue of how long it takes to sort tiles, and if you can increase framerate by overlapping them. I'd be shocked silly if you are claiming that nvidia's somehow did this at a significant framerate penalty because it has to sit on its hands while it sorts tiles.
Scali - Tuesday, August 2, 2016 - link
I think the problem is still your idea of 'latency'.
The 'latency' people worry about in VR is the time between the start of a frame: taking user input, preparing and sending draw calls to the API, and the time that frame is actually displayed on screen.

The 'latency' in the case of tile rendering is at a few levels lower in the abstraction. Yes, it may be possible that if you want to draw a single triangle, there's a slight bit of extra latency between sending the triangle to the driver and getting it on screen.
However, the point of tile-based rendering is not to speed up the rendering of a single triangle, but rather to speed up the rendering of entire frames, which is millions of triangles, with lots of textures and complex shaders to be evaluated.

So the equation for 'latency' in terms of VR is this:
Total frame latency = tile preparation time + rendering time.

Now, for an immediate mode renderer, 'tile preparation time' is 0.
Let's say an immediate mode renderer takes N time rendering, so total frame latency is 0 + N = N.

The tile-based renderer will have some non-zero tile preparation time, say K > 0.
However, because of the tiles removing redundant work, and improving overal cache-coherency, the rendering time goes down. So its rendering time is L < N.
Now, it may well be possible that K + L < N.
In which case, it actually has *lower* total frame latency for VR purposes.

"Obviously some API calls will stop things dead (anything that needs previously calculated pixels), "

Again, thinking at the wrong level of abstraction.
Such calls can only use buffers as a whole. So only the result of complete render passes, not of individual pixels or triangles.
The whole tile-based rendering doesn't even apply here. All drawing is done (hit a fence) before you would take the next step. This is no different for immediate mode renderers.
HollyDOL - Monday, August 1, 2016 - link
There are lots of notes about this technology coming from mobiles to GPUs... but wasn't tile rasterization implemented on PCs already back in times when mobile phones resembled a brick? Or is the current tile rasterization not related to the old PC one at all?
Not trying to catch for words, just trying to understand why it is being related in articles to mobile version and not the old PC one.
jabber - Monday, August 1, 2016 - link
Yeah...even the Dreamcast used it. The late 1990's was a fun but frustrating time with all sorts of paths for 3D. https://www.youtube.com/watch?v=SJKradGC9ao
BrokenCrayons - Monday, August 1, 2016 - link
The Dreamcast used a PowerVR graphics chip. Those things were tile-based even in PC form and their add-in card did pretty well throwing around original Unreal back in their heyday. When I was still kicking around an aging Diamond Viper V550 (nvidia's TNT graphics chip), I briefly considered a Kyro as an upgrade, but finally settled on a GeForce 256 when the DDR models finally fed them the memory bandwidth that SDR memory couldn't.
jabber - Monday, August 1, 2016 - link
Yeah I loved playing Unreal with my 4MB Matrox Mystique/M3D setup. Looked really good.
BrokenCrayons - Monday, August 1, 2016 - link
Yeah it was fantastic looking on PowerVR hardware. I'm not sure what it was about those cards (pretty sure mine was the same Matrox board...I recall thinking something along the lines of "What? That's it?" when pulling the card out of the box since there was so little on the PCB, just the the chip and the two memory ICs) but I liked the visuals more than I did when Unreal was running under Glide.
jabber - Monday, August 1, 2016 - link
Yeah tiny card...and no nasty image quality sapping analogue passthrough cable! I used that till I swapped over to a 3dFX Banshee when PowerVR lost the battle.
Ryan Smith - Monday, August 1, 2016 - link
"but wasn't tile rasterization implemented on PCs already back in times when mobile phones resembled a brick?"

If it makes it any clearer, I could write that it's a first for "video cards that weren't a market failure." Technically the early PowerVR desktop cards did it first, but ultimately they weren't successful on the market. Good idea, bad timing and poor execution.
HollyDOL - Monday, August 1, 2016 - link
Ic, makes it clear, thx... I thought I am missing some major difference.
Scali - Monday, August 1, 2016 - link
Sadly, it wasn't even so much the hardware or the drivers at fault, but rather the software.
Thing with immediate renderers is that they render immediately. A deferred renderer such as the PowerVR had to buffer the draw calls until a scene was finished. Direct3D had specific BeginScene()/EndScene() functions to mark this, but developers were very sloppy with their usage.
As a result, the driver could not accurately determine when it should render, and when it needs to preserve or flush the z-buffer (the PowerVR doesn't actually need a z-buffer in VRAM, the temporary tile-cache z-buffer is all it needs).
This caused a lot of depth-sorting bugs. Not because the hardware was broken, not because the driver was broken, but because people didn't write proper D3D code. It just 'happened to work' on cards that render directly to VRAM.
invinciblegod - Monday, August 1, 2016 - link
Isn't that what is innovative about Maxwell? They were able to implement it and the driver takes care of compatibility issues like the one you cite.
Scali - Monday, August 1, 2016 - link
Well, firstly, I don't think they're doing the same as what PowerVR is doing.
Secondly, PowerVR is now a major player in the mobile segment (powering every single iOS device out there, and also various other phones/tablets), the compatibility issues belong to a distant past.
wumpus - Wednesday, August 3, 2016 - link
Just out of curiosity, is this why we "need" g/freesync? It seems to be the solution to a problem that never should have existed, but either GPUs spew bad frames or LCDs get lost when accepting frames during an update.
Scali - Wednesday, August 3, 2016 - link
G-Sync is just to remove the legacy of the old CRT displays.
CRTs literally scan the display, left-to-right and top-to-bottom, at a given fixed frequency. Historically, the video signal was driving the CRT electronics directly, so you had to be in sync with the CRT.

LCDs just adopted that model, and buffered the frames internally. It was simple and effective. Initially just by digitizing the analog VGA signal that would normally drive a CRT. Later by a digital derivative in the form of DVI/HDMI.

But now that we have more advanced LCDs and more advanced image processors, we can choose to refresh the image whenever the GPU has finished rendering one, eliminating the vsync/double/triple buffering issues.
silverblue - Tuesday, August 2, 2016 - link
PowerVR 2 was delayed so that NEC/Videologic could manufacture enough for Dreamcast, meaning the PC launch of the Neon250 was after its competitors had caught up.

The first Kyro was also horribly underclocked, requiring Kyro II to fix its shortcomings (175MHz instead of 115).
Yojimbo - Monday, August 1, 2016 - link
I wonder if AMD would have to license IP to do tile-based rendering. NVIDIA has tile-based rendering IP from Gigapixel via 3dfx. AMD sold their Imageon IP to Qualcomm whereupon it was used for Adreno.
wumpus - Tuesday, August 2, 2016 - link
3dFX bought Gigapixel in 2000. Any patents applied for in the last year (how much do you think they were spending on R&D then?). If AMD starts designing a tiling chip right now, any Gigapixel patents will have safely expired before first silicon.
Yojimbo - Wednesday, August 3, 2016 - link
Firstly, that's 4 more years without access to TBR (the patent term is 20 years). And 6 years since Maxwell came out. I'd bet that AMD knew quite a bit about NVIDIA's rasterization process soon after they got their hands on a GPU. I seriously doubt this public revelation told AMD anything they didn't already know. If a technique really is giving NVIDIA an advantage, they'd want to neutralize that advantage as soon as possible if they can, not wait up to 6 years.

Secondly, you're making an assumption that they can do what they want with TBR with just the original Gigapixel patents. Other companies (Apple, ARM, Qualcomm, NVIDIA) have since filed for other patents dealing with TBR and if AMD has not already been researching TBR before now, they very well may need to be careful to not violate some of those patents as well as they go forward trying to apply TBR in modern architectures. So if whatever cross-patent deals they already have in place don't include TBR patents, they may need to seek new deals to safely pursue TBR.
Haroon90 - Monday, August 1, 2016 - link
AMD stated their 2018 Navi architecture will be mobile first so I presume they will implement something similar to this.
Eden-K121D - Monday, August 1, 2016 - link
Source?
Haroon90 - Monday, August 1, 2016 - link
My mistake they said it will be their first architecture with "scalability" in mind.

What else could that mean besides mobile,their gpus already scale from servers to laptops and judging by the power efficiency gains its slated to have that can only mean it has a mobile centered design.
Yojimbo - Monday, August 1, 2016 - link
Yeah I think Navi is the next chance for AMD to be more competitive against NVIDIA. AMD hasn't been as nimble as NVIDIA and I think their GCN architecture was designed for a different world than its being used in. I would like to think Navi is a big architectural overhaul, the biggest for AMD since the original GCN came out in 2011.
tarqsharq - Monday, August 1, 2016 - link
Considering AMD spends less than Nvidia on R&D, while also trying to compete with Intel on the CPU side with that same budget, the work they have done is impressive.
Yojimbo - Monday, August 1, 2016 - link
Not sure how one makes an accurate judgment of such a situation. I personally have zero experience with knowing what's possible with smaller R&Ds in the semiconductor industry. I have no data to compare this situation to. But I don't find much impressive about AMD's recent GPUs. It would be impressive if they were still competitive with a lower R&D, but the fact that their market share has been cut in half makes it hard to be impressed. I have a feeling the major areas where NVIDIA spends more money than AMD might be with software and individual GPU design, and not as significantly with architecture design. When NVIDIA makes an architecture they seem to be able to apply it to their entire lineup much faster than AMD can.
J0hnnyBGood - Monday, August 1, 2016 - link
Given that hardware development takes a long time, the recently reduced R&D budget will bite them in 2 to 3 years.
filenotfound - Monday, August 1, 2016 - link
Nvidia had relatif weak performance in DX12 asynchronous compute. Especially in maxwell.
Is this direct impact of choosing "tile base rasterization"?
Or not at all?
Yojimbo - Monday, August 1, 2016 - link
My uneducated guess is that is a shader scheduling issue, not a rasterization issue. NVIDIA used an inflexible scheduling method for mixed (graphics/compute) workloads in Maxwell. Pascal uses a better method that allows for dynamic balancing. The reason Polaris gets a larger speed boost than Pascal with asynchronous compute enabled compared with async disabled is probably because there's more 'air' in AMD's pipelines and so more resources available for asynchronous compute to take advantage of. In other words AMD's architecture is utilized less efficiently to begin with so more efficiency gain is available to be realized through asynchronous compute.
Scali - Monday, August 1, 2016 - link
"NVIDIA used an inflexible scheduling method for mixed (graphics/compute) workloads in Maxwell."

That is correct. When you run graphics and compute together, Maxwell splits up into two 'partitions', allocating some SMs to each partition.
As long as you balance your workload so that both the graphics and compute work complete at around the same time, this can work nicely.
However, since it cannot repartition 'on the fly', some SM will sit idle once their work is done, until the other SMs have completed as well, and the GPU can schedule new work/repartition the SMs.

So in theory, you can get gains from async compute on Maxwell. In practice it's even more difficult to tune for performance than GCN and Pascal already are.
killeak - Monday, August 1, 2016 - link
The truth is that ASync compute allows to use unused compute units, and GCN has a lot more ALU power than nVidia (Maxwell), but normal usage is way lower, that's why a Fury X (8.60 TFLOPS) competes with the 980ti (5.63 TFLOPS) and the 290x (also 5.63 TFLOPS) vs the 980 (4.61 TFLOPS).

So, even if Maxwell allowed to execute mixed compute and graphics wavefronts like GCN does, the amount of usued ALU power is less.

The fact that nVidia use 32 threads wavefronts vs 64 in GCN, is in part a reason why nVidia get it's units more busy.

Rasterization has nothing to do with ASync compute.
J0hnnyBGood - Monday, August 1, 2016 - link
As I understand it the ROPS are fixed function while compute runs on the shaders so it shouldn't be related.
Scali - Monday, August 1, 2016 - link
Intel has historically used a form of tile-based rendering as well, or 'Zone rendering', as they call it: http://www.intel.com/design/chipsets/applnots/3026...
Not sure if they still do, but I don't see why not. iGPUs are generally more bandwidth-restricted than dGPUs, so it makes even more sense there.
Yojimbo - Monday, August 1, 2016 - link
Yeah I wish that video compared more architectures than just Maxwell and TerraScale 2.
Yojimbo - Monday, August 1, 2016 - link
Or rather, Maxwell, Pascal, and TerraScale 2.
J0hnnyBGood - Monday, August 1, 2016 - link
Back when that pdf was up to date, Intel licensed PowerVR and didn't make there own GPUs.
Scali - Monday, August 1, 2016 - link
That is not entirely correct.
Intel licensed PowerVR technology, but only for their Atom-based systems.
The desktop/notebook GPUs, such as the ones discussed in this paper, were their own design. Also, Zone Rendering is quite different from PowerVR's TBDR.
Manch - Tuesday, August 2, 2016 - link
Wasn't Imageon and Adreno at one point in time ATI and AMD/ATI GPU designs/technologies that were sold off?
Mr.AMD - Monday, August 1, 2016 - link
Now the 1 Billion dollar question, can NVIDIA do Async.
Many say they can, i say and many sites support me on this, NOPE.
Rasterization and pre-emption are OK and all, but will never deliver the same performance as Async compute by AMD hardware.
JeffFlanagan - Monday, August 1, 2016 - link
You're not a credible person. You have to know that.
Yojimbo - Monday, August 1, 2016 - link
Amen, brutha. Keep preachin' the word...
Scali - Monday, August 1, 2016 - link
"can NVIDIA do Async"
...
"will never deliver the same performance as Async compute by AMD hardware"

^^ Those are two distinct things.

By applying your logic, AMD can't do 3D graphics, because they don't have the same performance as NVidia hardware.
HollyDOL - Monday, August 1, 2016 - link
As async compute on nV hardware doesn't throw NotImplementedException the answer on your question is obvious.

Is nV implementation of async compute as effective as AMD's is the question you should be asking.
garbagedisposal - Monday, August 1, 2016 - link
Nobody cares, they don't need async to fuck AMD. Can you guys ban this retard?
Michael Bay - Tuesday, August 2, 2016 - link
Nonono, good amd fanatics are rare nowadays.
ZeDestructor - Tuesday, August 2, 2016 - link
Well, this one isn't very good, so, yknow...
wumpus - Tuesday, August 2, 2016 - link
Here's a hint: if something is any good you don't give it a name like "async compute". Anything asychronus is the bane of all engineers who have to work with it, and typically the symptom of a failed design. Looks like it is working out as well as "heterogeneous computing", which was a much better idea, but software guys are too wimpy to bother when Moore has been doing their job for them since 1965 (might be a good time to start learning OpenGL, guys).
extide - Monday, August 1, 2016 - link
Wow this is a cool little piece, but honestly, a fresh new Dadid Kanter article, OMG!! YAY!
extide - Monday, August 1, 2016 - link
David Kanter, of course
Dr. Swag - Monday, August 1, 2016 - link
Any word on when the rx 480 and GTX 1060 review will be out? Pretty stoked especially for the rx 480 review, since I can't seem to find any deep dive on the Polaris architecture anywhere! Save us Ryan!
RT81 - Monday, August 1, 2016 - link
Could that "practical means to overcome the drawbacks" be related to drivers? If PC software is expecting to see one type of rendering, but Maxwell does it differently and it works *well*, could it be drivers? I'm just wondering if this type of thing is why Nvidia spends so much time with developers optimizing drivers for games. They're hardware does differently and they want to ensure their drivers enable the software to take full advantage of it.
RT81 - Monday, August 1, 2016 - link
Should be "Their hardware does rendering differently..."
kn00tcn - Monday, August 1, 2016 - link
but that time has also been spent for years on the older generations
telemarker - Monday, August 1, 2016 - link
FYI, watch this hotchips presentation for how the tiling is done on MALI
https://youtu.be/HAlbAj-iVbE?t=1h5m
Tigran - Monday, August 1, 2016 - link
Is it correct to say that Maxwell’s tile based rasterization is a hardware version of Giga3D?
Morawka - Monday, August 1, 2016 - link
Maybe Nvidia stayed quite about this because Imagination holds the patent!

i see a lawsuit in a few years.

https://www.google.com/patents/US8310487
Yojimbo - Monday, August 1, 2016 - link
NVIDIA has tile-based rendering patents from Gigapixel via 3dfx.
jabbadap - Monday, August 1, 2016 - link
nvidia has lot's of tiled-based rendering patents. Maybe one of the most interesting is one of the newest one:
https://www.google.com/patents/US20140118366

Terms are quite familiar for nvidia's modern architechtures.
Yojimbo - Tuesday, August 2, 2016 - link
Yes, and jumping a bit through the citations and referenced by links a bit I see other patents mentioning tiling by NVIDIA, ARM, Qualcomm, Intel, Apple, and Imagination, but not AMD. AMD and NVIDIA have a patent cross-licensing deal, though. Not sure how that affects things.
Jedi2155 - Monday, August 1, 2016 - link
I'm glad Nvidia has finally figured out how to make use of a decades old technology!
http://www.anandtech.com/show/735/3

Also kinda shows you how old I am xD
kn00tcn - Monday, August 1, 2016 - link
i dunno, you could have still read anand at age 10, it's still a potentially large age range
wumpus - Wednesday, August 3, 2016 - link
Oddly enough, before AMD bought ATI it seemed that all the "cool" tech that never went anywhere was bought up by ATI. And no, at least none of the "cool" tech I was interested in ever showed up in ATI or AMD products that I was aware of.

Some examples: The "bicmos" high speed powerpc design, the leading "mediaprocessor" company ("mediaprocessors's work would eventually be done by GPUs, but via a completely different design and the patents likely didn't help).

Now that I think about it, it is *possible* that the "bicmos" patents were used by *Intel* (who has access to AMD's patents) to come up with the Pentium 4, which is about the only time that whole bit actually helped AMD.
MobiusPizza - Tuesday, August 2, 2016 - link
So what are the disadvantages of tile based rasterization? Does this affect the IQ? How good is Nvidia in mitigating the disadvantages? These are the questions that need answering

People in the comment all praising Nvidia for this efficiency, but not until we know whether this change affect IQ, this can be labelled as benign optimization or balant cheating.
Scali - Tuesday, August 2, 2016 - link
"So what are the disadvantages of tile based rasterization? Does this affect the IQ? How good is Nvidia in mitigating the disadvantages? These are the questions that need answering"

I think they were already answered by the fact that NVidia has been doing this since Maxwell v2 at least, so the hardware has been on the market for about 2 years, until people found out that NVidia renders in a slightly unconventional way.
If it affected IQ or had any other disadvantages, people would long have noticed.

Aside from that, you can figure out for yourself that IQ is not affected at all. The order in which (parts of) triangles are rasterized does not affect the end result in a z-buffered renderer. If it did, the depth-sorting problems would be immediately apparent.
owan - Tuesday, August 2, 2016 - link
Your questions are basically exactly the same ones I was asking myself after reading this article, as no mention of the actual disadvantages is made anywhere. I'm curious as to what hurdles they actually had to overcome to implement this and don't feel like watching a 20 minute video when 30s of text reading would have sufficed.

The second part of your reply is a complete non-sequitur though. If it works it works. Nobody has commented on IQ issues or anything related to it and the cards have been in the market for years. "Blatant cheating"? Why are you even bringing that up? If the end result is the same (or better) quality than a competitive method, why would you even think to call it it cheating? Because its faster than the other company's method? This isn't a sport, there aren't "rules" in the same way as long as it works. If the ball goes in the net I don't give a crap if you throw it or kick it.
Wolfpup - Tuesday, August 2, 2016 - link
Wow, thanks for this article! It's been a while since I've been to realworldtech (I just don't visit websites like I did 10-20 years ago).

Really interesting...maybe tile based rendering is winning after all, after dying off on PC nearly two decades ago!
Wolfpup - Tuesday, August 2, 2016 - link
I would have loved seeing what this looks like on Kepler, just as a verification that this was a change to Maxwell...a couple generations of Intel hardware too.
Scali - Wednesday, August 3, 2016 - link
Yup, Maxwell v1 (GTX750), Kepler, perhaps even Fermi... See how far we have to go back to see different behaviour.
LordConrad - Tuesday, August 2, 2016 - link
Might this also explain why DirectX 12 and Vulkan work better on AMD cards?
Wolfpup - Tuesday, August 2, 2016 - link
Eh?
Scali - Tuesday, August 2, 2016 - link
Has nothing to do with it, see my post above.
Scali - Tuesday, August 2, 2016 - link
Sorry, forgot to paste link to my post: http://www.anandtech.com/comments/10536/nvidia-max...
tuxRoller - Wednesday, August 3, 2016 - link
No, David Kanter didn't provide any such evidence, his beliefs not withstanding.
A number of folks chimed in to say that it looks like "clever thread scheduling", but that's about.

http://www.realworldtech.com/forum/?threadid=15987...

http://www.realworldtech.com/forum/?threadid=15987...

http://www.realworldtech.com/forum/?threadid=15987...
Scali - Wednesday, August 3, 2016 - link
"A number of folks chimed in to say that it looks like "clever thread scheduling", but that's about."

Since apparently the threads are scheduled in a tile-arrangement, what exactly is the difference between saying it's 'clever thread scheduling' and 'tile-based' in this case?
This patent seems to describe what is going on here: https://www.google.com/patents/US20140118366
It basically describes a system where tiles are used as cache, and a set of primitives is processed per-tile (which is different from a pure immediate renderer, which may divide a primitive over quads or even tiles, but does not process multiple primitives at the same time).
wumpus - Wednesday, August 3, 2016 - link
Sounds like people are too hung up on "immediate vs. deferred". If it is seen by the programmer as immediate, it has to allow the programmer to read rasterized pixels that the GPU has been instructed to execute regardless of location and write (presumably texture or filter) the results in a different location. This means that the GPU isn't completely picky about the tiles and is effectively using them for caching.

But the tiles are plainly obvious in David Kanter's demo. And it seems that some commenters simply insist that "tile" has a well defined technical usage that goes beyond "a small portion of a frame buffer [typically square] that is rasterized at the same time". There are plenty of such comments, but I can't see them pointing to such a specific definition anywhere.
Scali - Wednesday, August 3, 2016 - link
Thing is, to the programmer, even a deferred renderer looks exactly the same. You pass it some triangles to render, and whether it renders them immediately or to a tile first, and then flushes them to VRAM, a draw call is an atomic operation from the programmer's perspective, so you can't tell the difference.
tuxRoller - Friday, September 9, 2016 - link
Yes, that is true, but we're talking about what the hardware itself is actually doing vs. the guarantees the driver is able to make to the developer.
versesuvius - Thursday, August 4, 2016 - link
This is very intriguing, while stinking to high heaven at the same time. So, is this tiling done in hardware or software? If the technique is implemented in hardware and enshrined in patents then why should Nvidia keep it a secret and not even make the slightest hints about it? Mobile chip makers have not kept it as a secret, so why should Nvidia? Other than that if it is implemented in hardware then why Microsoft should even think about tiling in the first place? Microsoft is not a hardware company. Maybe someday it could force hardware changes but even then the changes or improvements had to be implemented by hardware companies not Microsoft, a software company. So, the fact that Microsoft has even considered it points to tricks in software. Given that it all seems more likely that tiling is done by software, namely Nvidia drivers (which are possibly in cahoots with other drivers). After all it sounds like the old DOS way of doing games and graphics, i.e. preparing the entire page (by tiles or all at once) and then displaying it. Also, it is just plain foolish to think that AMD did not have a clue about this method. After all, AMD would be playing a loosing game for ever and eternity. No matter how good its design was and would be in the future, Nvidia would have a lead over it. So, why play the game at all?
Scali - Thursday, August 4, 2016 - link
Lolwat? Why are some people so hung up on 'software'? Sounds like the exact same argument I heard before... "nVidia does async compute in software".
Total nonsense. As if you could perform tiling in software efficiently enough to reach the performance levels that nVidia does (hint: Pascal-based GPUs are by far the fastest GPUs on the market. Heck, even if they did do it in software, who cares?).

I also don't see why you are pulling Microsoft into this. What point are you even trying to make? Maxwell and Pascal-based cards are fully backward compatible with earlier versions of Direct3D and OpenGL, which obviously could not have had any specific built-in functionality to cater for this type of rendering. Ergo, nVidia's approach to rendering is fully transparent at the API level.

AMD most probably knows enough about the theory of tile-based rendering. Just as they probably know all the theory behind Intel's CPUs, hyperthreading and all. The problem is translating that to an actual chip with good performance.
versesuvius - Thursday, August 4, 2016 - link
When Nvidia leaves a void as big as that, someone is bound to fill it with whatever one chooses. But why are you pulling the "compute" into this? Tiling is about games and rendering to display. Nividia chips have never been any better than AMD in "compute", async or not async. That is why the Linux community always prefers AMD over Nividia every time it has anything to offer in a particular class of hardware. The point is that if it is in hardware why keep quiet about it? Nvidia has never been shy of advertising its technology and its hardware, so why keep quite about this one for over two generations of hardware? Isn't it because it is nonexistent in the hardware? I am not pulling Microsoft into this. Microsoft was in it already, and just long enough, while not ever making any claims to being a hardware company, except slapping their logo on mouse and keyboards, or ever declaring any intention of turning into one. You don't care? So what! That is your privilege. But I think Nvidia should make things regrading this tiling business clear very soon, or a lot of people will start to care very strongly about this. Intel is an entirely different matter. Their business is not driven by drivers, just hardware.
Scali - Friday, August 5, 2016 - link
"When Nvidia leaves a void as big as that"

What are you even talking about?

"But why are you pulling the "compute" into this?"

Because it's another instance of broken AMD fanboy rhetoric.

"Nividia chips have never been any better than AMD in "compute", async or not async."

You're funny. Back when NVidia launched the GeForce 8800, AMD had no answer whatsoever for a long time.
It wasn't until years later, when OpenCL finally arrived, that AMD even became an option for compute.

"every time it has anything to offer in a particular class of hardware."

Sadly it's been a while since AMD has offered anything in the higher classes of hardware.

"I am not pulling Microsoft into this."

Yes you are, you said: "So, the fact that Microsoft has even considered it points to tricks in software."

What are you even talking about here? What did Microsoft even consider?

"But I think Nvidia should make things regrading this tiling business clear very soon"

Why? It works, it performs better than any alternative on the market.
That's not exactly a good reason for nVidia to share all their trade secrets, is it?

"or a lot of people will start to care very strongly about this."

Why would they care? You're the only one here who's crying foul, and it sounds like you're a jealous AMD fanboy who can't take it that nVidia does something smart and efficient that AMD can't do. So somehow you have to make nVidia look evil.
Everyone else here seems to be perfectly happy with the fact that nVidia is doing something different, and that it apparently works very well in practice.
versesuvius - Friday, August 5, 2016 - link
Now it is a trade secret? Fixing the market is a trade secret? Making roads that only one type of car can run on is a trade secret? That must be the greatest trade secret of all. It is no secret that AMD have much more longevity than Nvidia. They may not surpass Nvidia at first but they are good for far longer with new games and OS revisions and updates than Nvidia cards ever are. Fanboy all you like. Nvidia will have to clear this up sooner or later, or answer for it eventually.
Scali - Friday, August 5, 2016 - link
What are you even talking about? Tiled rendering is completely transparent to the API and applications. It is just a slightly different approach to rendering, which can have certain benefits for efficiency.
Too many angry brainwashed AMD zombies such as yourself, littering up every discussion online.
versesuvius - Friday, August 5, 2016 - link
Now it is transparent? Upfront and before anything Ryan Smith says that Maxwell and Pascal have been the greatest mystery of his career and you say it is transparent? Then why is Nvidia so quite about it? Not a word about this transparent technology that gives Nvidia its advantage only on the Microsoft OS and nowhere else, and surprise, surprise that is where the bulk of Nvidia's profits come from. Nvidia can not pretend honesty and transparency in this case. The only thing that can possibly compel Nividia to keep quite about this is the existence of another partner in the success of this so called "tiling technique". Nvidia must come clean about this tiling business. There is no way around it.
Scali - Friday, August 5, 2016 - link
Transparent to the API and applications, as in, the GPU does exactly the same as any other GPU, as far as the API and application are concerned.
Also, what's your problem?
And on Microsoft's OS, nowhere else... say what? Nothing you say even makes sense.
versesuvius - Saturday, August 6, 2016 - link
Which API? DirectX? Windows API? Nvidia drivers, which granted are not strictly APIs, but can return NULL or wait or deliver or call into another API. In fact what you are saying does not make any sense. You say Nvidia has found a way that makes tiling possible without the programmer knowing anything about it. Fine. Where is it? In the hardware? The answer is a resounding NO. In the software? You don't know. Does it cooperate with other parts of the system? You don't know. And you call the sum of your blissful ignorance, Nvidia's trade secret, or transparent API. That is not nearly enough. Your attitude only helps Nvidia to get away with what is clearly another of its unsavory practices. Well, pay and be happy ever after. But Nvidia will have to answer for it sooner or later.
JiggeryPokery - Saturday, August 6, 2016 - link
You don't seem to have the slightest idea what you're talking about, this is just about tile based rendering, there's nothing dodgy or underhanded about it, it's just another technique for rendering that has it's own benefits and drawbacks. TBR has been around for many years and is even used today by numerous mobile gfx chips from the likes of ARM and PowerVR.
versesuvius - Saturday, August 6, 2016 - link
Oh, yes. Now I remember. Thank you.
Scali - Saturday, August 6, 2016 - link
"You say Nvidia has found a way that makes tiling possible without the programmer knowing anything about it."

As JiggeryPokery already said, tile-based rendering has been around for ages. Intel has done it, ARM does it, Imagination Tech (PowerVR) does it, and even AMD has some form of tile-rendering.

"Where is it? In the hardware?"

Yes, it is in the hardware, which you could have seen if you bothered to check out the patent linked above: https://www.google.com/patents/US20140118366

"Does it cooperate with other parts of the system? You don't know."

Erm, what kind of questions are these even? Maxwell has been on the market for about 2 years, and runs all Direct3D, OpenGL and Vulkan software you throw at it, from Windows or linux. Obviously it works just fine, people didn't even notice anything unusual going on, because as already said: the implementation works transparently to the API and applications (so to 'other parts of the system').

"Your attitude only helps Nvidia to get away with what is clearly another of its unsavory practices."

How exactly is this even unsavoury? To the end user it works the same as older nVidia GPUs or AMD GPUs, it just makes it a bit more efficient, yielding higher performance and lower power consumption. Sounds like a win-win to me.

"But Nvidia will have to answer for it sooner or later."

For what? Making GPUs with the highest performance and best performance-per-watt?
versesuvius - Saturday, August 6, 2016 - link
Because while all those companies have been using it or working on it and have not been quiet about this, Nvidia has kept quiet about this. If it is a trade secret, why everybody else has talked openly about it and applied for patents for the technology while Nvidia has not? Is it not naming the devil so its competitor will not know and then try to do it and remove the advantage that Nvidia has over it? Hardly. As you say a lot of people have known about this and have been using this technique, even Google, again a software company. I also wonder why no one has put the question to Nvidia after this discovery either. This is intriguing stuff, but not only Nvidia but also all the tech sites have forgotten about it. It is transparent to API but that does not mean that programmers cannot make some good use of it purposefully. Why Nvidia is quiet about it? Is there shame in applying for such valuable patent? Is there shame in announcing their prowess and how much they are ahead of the game? Of course not. The thing is that there are people like you who find honor in what Nvidia is doing, and Nvidia has perhaps always counted on that and thus developed into what it is now.

As for the Google patent, it sounds like bullshit to me. More like something filed by a patent troll.
Scali - Saturday, August 6, 2016 - link
"As for the Google patent, it sounds like bullshit to me. More like something filed by a patent troll."

Google just provides a service to search and view patents.
If you bother to look, you see that it links directly to the US Patent Office. It's the real thing.
I think I know who is doing the trolling here...
versesuvius - Sunday, August 7, 2016 - link
If you say so. Problems was and is, that I am posting from Iran, and since we are under so many American sanctions, including the sanctions that Google on the behest of the US government has put on us that the page does not load completely. Different parts of the page are fetched from different servers, and some of those servers may be shut out to traffics to and from Iran, and some are filtered out by Google due to sanctions. The nastiest one is when the style sheet is missing. However, the abstract came through and still sounds like bullshit, or given American patent system maybe a defense against other patents.

On the other hand, you who are obviously under no such sanctions have not come up with any explanations as to why Nvidia is so quiet about this technology, while everybody else has been quite clear about it. And while you are on it, (at the risk of violating the righteous, mighty, whatever American government), who did file that patent?
Scali - Sunday, August 7, 2016 - link
"Problems was and is, that I am posting from Iran"

You're funny. Apparently you cannot even access all the information that is around on the internet, yet you post with an arrogant and all-knowing attitude, and sling all sorts of accusations around, which you can't possibly base on proper information, since you cannot access this information.
A normal person would not hold and defend strong opinions about things they know too little of. In fact, they'd say upfront that they do not have access to this information, or have not researched the topic in-depth.

"On the other hand, you who are obviously under no such sanctions have not come up with any explanations as to why Nvidia is so quiet about this technology, while everybody else has been quite clear about it."

You're turning things around. There is no reason why NVidia needs to disclose every detail of their implementation. No vendor ever does.
Some vendors may focus more on tile-based rendering, because especially in the case of Imagination Tech, it is their primary strength. They target the mobile market, and their approach can significantly improve power efficiency.
For desktop cards it is not that relevant. And since Imagination Tech flopped on the desktop and gave tile-based rendering a bad reputation there, you'd first have to prove that it actually is a valid approach, before you can promote it as a feature.
I think nVidia has done exactly that: They have been using it for a few years now, in GPUs that were very successful.

"who did file that patent?"

Inventors: HAKURA; Ziyad S.; (Gilroy, CA) ; DIMITROV; Rouslan; (San Carlos, CA)
Applicant: NVIDIA Corporation, Santa Clara, CA, US
Assignee: NVIDIA Corporation, Santa Clara, CA
versesuvius - Sunday, August 7, 2016 - link
I am not turning anything around. I just told you why I do not have the details of the patent you linked to.

So, that is what Nvidia has been doing for some years (date of the patent?), in fact two generations of its GPUs (and according to you "proving that it is a valid approach" LOL!) and all the while AMD has been sitting on its bottom and doing nothing about it, although there was an Nvidia patent for it for so long and AMD knew very clearly the general direction it had to take and did not move in that direction. Next thing you know, and you are apt to say that what Mr. Hakura and Mr. Dimitrov have achieved is the only way there is in universe to do tiling with the modern PC GPUs, and it would be hopeless waste of resources for AMD and also Intel, dumbass engineers that they are to implement tiling on their GPU systems and reap the enormous benefits it brings.

Still, I think that the basic ideas put in that patent is bullshit and just a play with some technical terms and that it has never materialized in any Nvidia chip, and I still think that Nvidia has to at least make a general statement about tiling on its GPUs. Of course for now everybody seems happy not to ask, and it will have to wait for another day. But in the meantime there will be closer looks at Nvidia GPUs and the systems that they operate with.
Scali - Sunday, August 7, 2016 - link
"I am not turning anything around. I just told you why I do not have the details of the patent you linked to."

The patents were already linked by someone else earlier in the thread. Besides, you kept attacking these patents, and only now admitted you can't actually see the details.

"all the while AMD has been sitting on its bottom and doing nothing about it"

AMD has enough problems, there's a reason why their cards have been little more than rehashes and rebadges of GCN for years now, why they were late even with basic features such as HDMI 2.0, and why they are the only player on the desktop that still has no FL12_1 support.
They are probably moving in that direction, they are just not moving as quickly as you think.

"Still, I think that the basic ideas put in that patent is bullshit and just a play with some technical terms and that it has never materialized in any Nvidia chip"

This is what is known as 'conjecture'. I don't know why you even bother to post this sort of stuff.

"and I still think that Nvidia has to at least make a general statement about tiling on its GPUs."

You never gave any valid reason why though. NVidia, or any other vendor for that matter, has no obligation to share every detail of their products with the general public.

"But in the meantime there will be closer looks at Nvidia GPUs and the systems that they operate with."

Why? What reason could you possibly think of that warrants a "closer look" than what people have normally been doing whenever a new GPU arrives?
I mean, you're talking as if NVidia's GPUs are fundamentally broken or whatever, while in reality they have been doing this for 2 years before people wrote a test application and discovered it.
If they had never discovered it, would that have changed anything? No it wouldn't. All games that people have tried to run on these GPUs over the past 2 years have worked fine.
So really. why are you acting like this?
versesuvius - Monday, August 8, 2016 - link
"The patents were already linked by someone else earlier ..."

Now, who is conjecturing there? Or are you just feeling punctual? Do you read everything, including comment pages that can add up to 50 pages at times? If so, good on you. I wish you joy of it.

Anyways, you must know by now what the argument is about by now. I'll repeat it for you. Maybe you finally understand. There is nothing in the Nvidia hardware that is put in there to enable the tiling technique that the Nvidia graphic system uses. Nothing in the hardware. It is either the Nvidia drivers or their drivers cooperating with the windows drivers to give it its advantage in Games on Windows, and only on games on Windows. Microsoft has been installing back doors in various parts of Windows since before, so why not install one for Nvidia? It would not even have to be a back doo, just a pinhole. I do not have a silicon map of Nvidia GPU to show to you, and nobody else does either. That is THE trade secret. However, there is nothing in the world that should have kept Nvidia making a point about this, specially something that a lot of people have worked on before and is no special idea to begin with. To say that a company worked on it over a decade ago and could not make it work on the Windows desktop and so Nvidia is right in not talking about it now is quite an stretch. What could go wrong if Nvidia had made it clear that their GPU systems uses tiling and any programmer working on games could utilize this wonderful technique to speed up their games. Then it would not have to muddy its name with "Nvidia, That is how ..." in some games opening screens, while every programmer would code for Nvidia cards from the beginning. That is not how it happened. The only conclusion is that Nvidia is not honest about this and has been doing something wrongful . Now, I cannot prove it. Yet everything is there for anyone who wants to follow the money.
igot1forya - Monday, August 8, 2016 - link
I was literally just thinking about this technology (and PowerVR KYRO) recently and was thinking, "Everyone is racing to add more and more memory to their graphics cards, why not just bring this tech back and reduce the memory demand?"! Good to see, what is old is new again!

Hidden Secrets: Investigation Shows That NVIDIA GPUs Implement Tile Based Rasterization for Greater Efficiency

Post Your Comment

191 Comments

Back to Article

jjj - Monday, August 1, 2016 - link

Michael Bay - Monday, August 1, 2016 - link

telemarker - Monday, August 1, 2016 - link

Alexvrb - Monday, August 1, 2016 - link

Strunf - Tuesday, August 2, 2016 - link

Scali - Tuesday, August 2, 2016 - link

TessellatedGuy - Monday, August 1, 2016 - link

Remon - Monday, August 1, 2016 - link

milli - Monday, August 1, 2016 - link

close - Monday, August 1, 2016 - link

looncraz - Monday, August 1, 2016 - link

Scali - Monday, August 1, 2016 - link

gamervivek - Monday, August 1, 2016 - link

medi03 - Monday, August 1, 2016 - link

Scali - Monday, August 1, 2016 - link

TheJian - Tuesday, August 2, 2016 - link

wumpus - Wednesday, August 3, 2016 - link

haukionkannel - Monday, August 1, 2016 - link

Scali - Monday, August 1, 2016 - link

silverblue - Tuesday, August 2, 2016 - link

piroroadkill - Monday, August 1, 2016 - link

extide - Monday, August 1, 2016 - link

Alexvrb - Monday, August 1, 2016 - link

mr_tawan - Tuesday, August 2, 2016 - link

Scali - Tuesday, August 2, 2016 - link

wumpus - Tuesday, August 2, 2016 - link

Scali - Tuesday, August 2, 2016 - link

KillBoY_UK - Monday, August 1, 2016 - link

Mr.AMD - Monday, August 1, 2016 - link

Budburnicus - Tuesday, May 2, 2017 - link

TessellatedGuy - Monday, August 1, 2016 - link

bigboxes - Tuesday, August 2, 2016 - link

Remon - Monday, August 1, 2016 - link

Alexvrb - Monday, August 1, 2016 - link

Chaser - Tuesday, August 2, 2016 - link

wumpus - Tuesday, August 2, 2016 - link

Scali - Monday, August 1, 2016 - link

JoeyJoJo123 - Monday, August 1, 2016 - link

Scali - Monday, August 1, 2016 - link

qap - Monday, August 1, 2016 - link

Scali - Monday, August 1, 2016 - link

Alexvrb - Monday, August 1, 2016 - link

StrangerGuy - Tuesday, August 2, 2016 - link

Scali - Tuesday, August 2, 2016 - link

silverblue - Tuesday, August 2, 2016 - link

Alexvrb - Wednesday, August 3, 2016 - link

Ro_Ja - Wednesday, August 10, 2016 - link

fanofanand - Tuesday, August 2, 2016 - link

KillBoY_UK - Monday, August 1, 2016 - link

TessellatedGuy - Monday, August 1, 2016 - link

close - Monday, August 1, 2016 - link

TessellatedGuy - Monday, August 1, 2016 - link

godrilla - Monday, August 1, 2016 - link

hojnikb - Monday, August 1, 2016 - link

emn13 - Monday, August 1, 2016 - link

looncraz - Monday, August 1, 2016 - link

retrospooty - Monday, August 1, 2016 - link

sharath.naik - Monday, August 1, 2016 - link

Yojimbo - Monday, August 1, 2016 - link

StrangerGuy - Monday, August 1, 2016 - link

Yojimbo - Tuesday, August 2, 2016 - link

wumpus - Wednesday, August 3, 2016 - link

Mr.AMD - Monday, August 1, 2016 - link

TessellatedGuy - Monday, August 1, 2016 - link

looncraz - Monday, August 1, 2016 - link

Scali - Monday, August 1, 2016 - link

JeffFlanagan - Monday, August 1, 2016 - link

Chaser - Tuesday, August 2, 2016 - link

StrangerGuy - Tuesday, August 2, 2016 - link

Yojimbo - Tuesday, August 2, 2016 - link

Jedi2155 - Monday, August 1, 2016 - link

Alexvrb - Monday, August 1, 2016 - link

Cygni - Tuesday, August 2, 2016 - link

asendra - Monday, August 1, 2016 - link

Scali - Monday, August 1, 2016 - link

asendra - Monday, August 1, 2016 - link