Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored

Name: Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored
Item: Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored

by Brian Klug & Anand Lal Shimpi on October 7, 2011 12:35 PM EST

108 Comments | Add A Comment

108 Comments

The Adreno 225 GPU

Qualcomm has historically been pretty silent about its GPU architectures. You'll notice that specific details of Adreno GPU execution resources have been absent from most of our SoC comparisons. Starting with MSM8960 however, this is starting to change.

The MSM8960 uses a current generation Adreno GPU with a couple of changes. Qualcomm calls this GPU the Adreno 225, a follow-on to Adreno 220. Subsequent Krait designs will use Adreno 3xx GPUs based on a brand new architecture.

As we discussed in our Samsung Galaxy S 2 review, Qualcomm's Adreno architecture is a tile based immediate mode renderer with early-z rejection. By Qualcomm's own admission, Adreno is somewhere in the middle of the rendering spectrum between IMRs and Imagination Technologies' TBDR architectures. One key difference is Adreno's tiling isn't as fine grained as IMG's.

Architecturally the Adreno 225 and 220 are identical. Adreno 2xx is a DX9-class unified shader design. There's a ton of compute on-board with eight 4-wide vector units and eight scalar units. Each 4-wide vector unit is capable of a maximum of 8 MADs per clock, while each scalar unit is similarly capable of 2 MADs per clock. That works out to 160 floating point operations per clock, or 32 GFLOPS at 200MHz.

Update: Qualcomm has clarified the capabilities of its 4-wide Vector ALUs. Similar to the PowerVR SGX 543, each 4-wide vector ALU is capable of four MADs (one per component). The scalar units cannot be combined to do any MADs, although they are helpful we haven't really been tracking those in this table (IMG has something similar) so we've excluded them for now.

Mobile SoC GPU Comparison
	Adreno 225	PowerVR SGX 540	PowerVR SGX 543	PowerVR SGX 543MP2	Mali-400 MP4	GeForce ULP	Kal-El GeForce
SIMD Name	-	USSE	USSE2	USSE2	Core	Core	Core
# of SIMDs	8	4	4	8	4 + 1	8	12
MADs per SIMD	4	2	4	4	4 / 2	1	?
Total MADs	32	8	16	32	18	8	?
GFLOPS @ 200MHz	12.8 GFLOPS	3.2 GFLOPS	6.4 GFLOPS	12.8 GFLOPS	7.2 GFLOPS	3.2 GFLOPS	?
GFLOPS @ 300MHz	19.2 GFLOPS	4.8 GFLOPS	9.6 GFLOPS	19.2 GFLOPS	10.8 GFLOPS	4.8 GFLOPS	?

Looking at the table above you'll see that this is the same amount of computing power than even IMG's PowerVR SGX 543MP2. However as we've already seen in our tests, Adreno 220 isn't anywhere near as quick.

Shader compiler efficiency and data requirements to actually populate a Vec4+1 array are both unknowns, and I suspect both significantly gate overall Adreno performance. There's also the fact that the Adreno 22x family only has two TMUs compared to four in the 543MP2, limiting texturing performance. Combine that with the fact that most Adreno 220 GPUs have been designed into single-channel memory controller systems and you've got a recipe for tons of compute potential limited by other bottlenecks.

With Adreno 225 Qualcomm improves performance along two vectors, the first being clock speed. While Adreno 220 (used in the MSM8660) ran at 266MHz, Adreno 225 runs at 400MHz thanks to 28nm. Secondly, Qualcomm tells us Adreno 225 is accompanied by "significant driver improvements". Keeping in mind the sheer amount of compute potential of the Adreno 22x family, it only makes sense that driver improvements could unlock a lot of performance. Qualcomm expects the 225 to be 50% faster than the outgoing 220

Qualcomm claims that MSM8960 will be able to outperform Apple's A5 in GLBenchmark 2.x at qHD resolutions. We'll have to wait until we have shipping devices in hand to really put that claim to the test, but if true it's good news for Krait as the A5 continues to be the high end benchmark for mobile GPU performance.

While Adreno 225 is only Direct3D feature level 9_3 compliant, Qualcomm insisted that when the time is right it will have a D3D11 capable GPU using its own IP - putting to rest rumors of Qualcomm looking to license a third party GPU in order to be competitive in Windows 8 designs. Although Qualcomm committed to delivering D3D11 support, it didn't commit to a timeframe.

Memory Hierarchy & Process Technology MSM8960 Cellular Connectivity

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

108 Comments

View All Comments

metafor - Friday, October 7, 2011 - link
Scorpion does support dual-channel, however, the 8x60 series does not have two controllers. The 8x55/7x30 does, however and in most cases, are used in the configuration you described in the article.
ArunDemeure - Friday, October 7, 2011 - link
I knew MSM7x30/8x55 was dual-channel but I thought it was also available as a 64-bit LPDDR2 PoP solution? While it makes sense for most people to use it as single-channel LPDDR2 as opposed to dual-channel LPDDR1 these days, why would anyone ever have used both PoP and non-PoP DRAM at the same time? Maybe that old leaked presentation on Baidu listing all the MSM7x30/8x55 packages is wrong though.
metafor - Friday, October 7, 2011 - link
A single Scorpion and Adreno 205 just didn't need both channels. It makes more sense for a lot of OEM's to use single 32-bit LPDDR2.
ArunDemeure - Friday, October 7, 2011 - link
Hmm, that would certainly be news to me, it's possible but you'd still need a second memory controller and PHY so it makes very little sense. I can see a few possibilities:
- The LPDDR2 and DDR2 subsystems aren't shared so in theory for tablets you could do 32-bit SiP LPDDR2+32-bit off-chip DDR2. Seems weird but not impossible.
- You can do 32-bit ISM+32-bit PoP. Once again, why do this? Were they limited by package pins with a 0.4mm pitch? Seems unlikely with a 14x14 package but who knows.
- You can genuinely do 32-bit PoP+32-bit on the PCB. Still seems really weird to me.

The MSM7200(A) had a separate small LPDDR1 chip (16-bit bus with SiP) reserved mostly for the baseband while the primary OS-accessible DRAM was off-chip. This was obviously rather expensive (fwiw Qualcomm only 'won' that generation on software and weak competition IMO) and removed it to reduce cost (making the chip's memory arbitrage more complex) on the MSM7227. I'm not sure about the QSD8650, maybe it still optionally had that extra memory bus (SiP-only) but it was more flexible and never used, it's hard to find that kind of info.

Cheers,
Arun
mythun.chandra - Friday, October 7, 2011 - link
Anand,

Isn't this what I had pinged you about earlier?
z0mb13n3d - Friday, October 7, 2011 - link
I suggest you look into the facts before passing such statements.

I don't know where you or the OP are getting your information from (3GHz A15's, quad 2.5GHz Kraits hitting next year, Kraits using HKMG etc.), but that's been pretty inaccurate. All you're doing is speculating based on bits and pieces floating around in PDF's and slides. I still remember one of his claims from the previous thread '2x A15's > 4xA9's' . While no one in their right sense of mind would argue that a the wider, deeper, single A15 is better than a single A9, to make such an uninformed, blanket statement (and to back it up with useless DMIPS numbers!) just doesn't bode very well.
ArunDemeure - Friday, October 7, 2011 - link
ST-Ericsson has publicly indicated the A9600's A15s can run at up to 2.5GHz, and GlobalFoundries has publicly said that the A9600 uses their 28nm SLP process which uses High-K but not SiGe strain. Is it really hard to believe a 28HPM or 28HP A15 could easily reach 3GHz? I'm not sure anyone will do that in the phone/tablet market, but remember ARM also wants A15 to target slightly larger Windows 8 notebooks and (I'm not as optimistic about this) servers.

As for Krait, Qualcomm's initial PR mentioned 2.5GHz (not just random slides) and APQ8064 is on TSMC 28HPM which uses High-K. If you don't trust either me or metafor on that, Qualcomm has also publicly stated that most of their chips will run on SiON but that they were considering High-K for chips running at 2GHz or above: http://semimd.com/blog/2011/02/07/qualcomm-shies-a...

As for 2xA15 vs 4xA9, metafor's point is that most applications are still not sufficiently multithreaded. It has very little to do with DMIPS which is a worthless outdated benchmark (not that Coremark is perfect mind you - where oh where is my SPECInt for handhelds? Development platforms could support enough RAM to run it by now). Unlike him I think 4xA9 should be relatively competitive even if clearly inferior in some important cases, and as you imply it's a difficult and even fairly subjective topic, but I don't think metafor's opinion is unreasonable.
z0mb13n3d - Friday, October 7, 2011 - link
That is the point I'm trying to make! Semiconductor companies, by virtue of the fact that they have to sign OEM/ODM deals before they really even have working products almost always posture about how much their designs can go 'up to' or 'indicate' ratings and numbers. My beef with the earlier thread was that statements were being passed on as facts based purely on stuff posted in press releases. I can tell you, for a fact, that no 2.5GHz Krait (dual or quad) based product will be shipping in '12. I can also tell you for a fact that you will not see anything more than 1.8-2.2GHz (optimistic) in shipping A15's for mobile devices. I understand the A15 architecture is capable of much more, but to try and draw comparisons between a near-shipping mobile-spec quad-core A9 and an on-paper 3GHz A15 powering servers is not correct!

If you did follow the previous thread closely, you will see that this was the only point I was trying to get across, in vain. No matter how you slice and dice it, the 2xA15 > 4xA9 argument is wrong. This is very similar to what we're seeing in the x86 market with Intel and AMD where the older, tri and quad core AMD's are still able to keep-up with or beat dual-core Intel's in threaded situations. Now it is an entirely different argument as to whether or not Google/MS/whoever else makes effective use of multi-core CPU's in their current mobile platforms and their relatively crude/simple kernels (as compared to desktop operating systems), but come Windows 8, I am willing to bet that quad core (or multi-core in general) SoC's will prove their worth.
ArunDemeure - Friday, October 7, 2011 - link
ST-E could underdeliver on the A9600, sure, but they've got a better process than OMAP5 and enough clever power saving tricks up their sleeve (some of which still aren't public) that I feel it's quite likely they won't. Remember 2.5GHz is only their peak frequency when a single core is on - they have not disclosed their throttling algorithms (which will certainly be more aggressive for everyone in the 28nm generation, especially on smartphone SKUs as opposed to tablets where higher TDPs are acceptable).

Also multiple companies will be making A15s on 28HPM eventually. TSMC has indicated they have a lot of interest in HPM, and that should certainly clock at least 25% higher than GF's Gate-First Non-SiGe 28SLP. However the problem is that the A15 is quite power hungry, so I expect people will use that frequency headroom to undervolt and reduce power although a few might expose it with a TurboBoost-like mechanism. On the other hand, exposing the full 3GHz for Windows 8 on ARM mini-notebooks should be a breeze, and I don't see why you'd expect that to be a problem.

As for 2.5GHz Quad-Core Krait in 2012 - I think they're still on schedule for tablets in late 2012, but then again NVIDIA was still on schedule for tablets in August 2011 back in February, so it's impossible to predict these things. Delays happen, and it'd be foolish not to take metafor seriously simply because he is unable to predict the unpredictable.

Finally, 2xA15 vs 4xA9... metafor's point is that given the lower maturity of multithreading on handheld devices, it's more like high-end quad-core Intel CPUs beating eight-core AMD CPUs in the real world. As I said I'm not sure I agree, but it's fairly reasonable.
dagamer34 - Saturday, October 8, 2011 - link
I doubt it was a delay as much as nVidia being boastful. They've quite known for that.

Qualcomm's New Snapdragon S4: MSM8960 & Krait Architecture Explored

The Adreno 225 GPU

Post Your Comment

108 Comments

View All Comments

metafor - Friday, October 7, 2011 - link

ArunDemeure - Friday, October 7, 2011 - link

metafor - Friday, October 7, 2011 - link

ArunDemeure - Friday, October 7, 2011 - link

mythun.chandra - Friday, October 7, 2011 - link

z0mb13n3d - Friday, October 7, 2011 - link

ArunDemeure - Friday, October 7, 2011 - link

z0mb13n3d - Friday, October 7, 2011 - link

ArunDemeure - Friday, October 7, 2011 - link

dagamer34 - Saturday, October 8, 2011 - link

Log in

Don't have an account? Sign up now