AMD’s Kaveri: Pre-Launch Information
by Ian Cutress on January 6, 2014 8:00 PM ESTOn the back of AMD’s Tech Day at CES 2014, all of which was under NDA until the launch of Kaveri, AMD have supplied us with some information that we can talk about today. For those not following the AMD roadmap, Kaveri is the natural progression of the AMD A-Series APU line, from Llano, Trinity to Richland and now Kaveri. At the heart of the AMD APU design is the combination of CPU cores (‘Bulldozer’, ‘Steamroller’) and a large dollop of GPU cores for on-chip graphics prowess.
Kaveri is that next iteration in line which uses an updated FM2+ socket from Richland and the architecture is updated for Q1 2014. AMD are attacking with Kaveri on four fronts:
Redesigned Compute Cores* (Compute = CPU + GPU)
Kaveri uses an enhanced version of the Richland CPU core, codename Steamroller. As with every new CPU generation or architecture update, the main goal is better performance and lower power – preferably both. AMD is quoting a 20% better x86 IPC with Kaveri compared to Richland when put clock to clock. For the purposes of this information release, we were provided with several AMD benchmarking results to share:
These results border pretty much on the synthetic – AMD did not give any real world examples today but numbers will come through in time. AMD is set to release two CPUs on January 14th (date provided in our pre-release slide deck), namely the A10-7700K and the A10-7850K. Some of the specifications were also provided:
AMD APUs | ||||
Richland A8-6600K |
Richland A10-6800K |
Kaveri A10-7700K |
Kaveri A10-7850K |
|
Release | June 4 '13 | June 4 '13 | Jan 14th '14 | Jan 14th '14 |
Frequency | 3900 MHz | 4100 MHz | ? | 3700 MHz |
Turbo | 4200 MHz | 4400 MHz | ? | ? |
DRAM | DDR3-1866 | DDR3-2133 | DDR3-2133 | DDR3-2133 |
Microarhitecture | Piledriver | Piledriver | Steamroller | Steamroller |
Manufacturing Process | 32nm | 32nm | ? | ? |
Modules | 2 | 2 | ? | 2 |
Threads | 4 | 4 | ? | 4 |
Socket | FM2 | FM2 | FM2+ | FM2+ |
L1 Cache |
2 x 64 KB I$ 4 x 16 KB D$ |
2 x 64 KB I$ 4 x 16 KB D$ |
? | ? |
L2 Cache | 2 x 2 MB | 2 x 2 MB | ? | ? |
Integrated GPU | HD 8570D | HD 8670D | R7 | R7 |
IGP Cores | 256 | 384 | ? | 512 |
IGP Architecture | Cayman | Cayman | GCN | GCN |
IGP Frequency | 844 | 844 | ? | 720 |
Power | 100W | 100W | ? | 95W |
All the values marked ‘?’ have not been confirmed at this point, although it is interesting to see that the CPU MHz has decreased from Richland. A lot of the APU die goes to that integrated GPU, which as we can see above becomes fully GCN, rather than the Cayman derived Richland APUs. This comes with a core bump as well, seeing 512 GPU cores on the high end module – this equates to 8 CUs on die and what AMD calls ’12 Compute Cores’ overall. These GCN cores are primed and AMD Mantle ready, suggesting that performance gains could be had directly from Mantle enabled titles.
Described in AMD’s own words: ‘A compute core is an HSA-enabled hardware block that is programmable (CPU, GPU or other processing element), capable of running at least one process in its own context and virtual memory space, independently from other cores. A GPU Core is a GCN-based hardware block containing a dedicated scheduler that feeds four 16-wide SIMD vector processors, a scalar processor, local data registers and data share memory, a branch & message processor, 16 texture fetch or load/store units, four texture filter units, and a texture cache. A GPU Core can independently execute work-groups consisting of 64 work items in parallel.’ This suggests that if we were to run asynchronous kernels on the AMD APU, we could technically run twelve on the high end APU, given that each Compute Core is capable of running at least one process in its own context and virtual memory space independent of the others.
The reason why AMD calls them Compute Cores is based on their second of their four pronged attack: hUMA.
HSA, hUMA, and all that jazz
AMD went for the heterogeneous system architecture early on to exploit the fact that many compute intensive tasks can be offloaded to parts of the CPU that are designed to run them faster or at low power. By combining CPU and GPU on a single die, the system should be able to shift work around to complete the process quicker. When this was first envisaged, AMD had two issues: lack of software out in the public domain to take advantage (as is any new computing paradigm) and restrictive OS support. Now that Windows 8 is built to allow HSA to take advantage of this, all that leaves is the programming. However AMD have gone one step further with hUMA, and giving the system access to all the memory, all of the time, from any location:
Now that Kaveri offers a proper HSA stack, and can call upon 12 compute cores to do work, applications that are designed (or have code paths) to take advantage of this should emerge. One such example that AMD are willing to share today is stock calculation using LibreOffice's Calc application – calculating the BETA (return) of 21 fake stocks and plotting 100 points on a graph of each stock. With HSA acceleration on, the system performed the task in 0.12 seconds, compared to 0.99 seconds when turned off.
Prong 3: Gaming Technologies
In a year where new gaming technologies are at the forefront of design, along with gaming power, AMD are tackling the issue on one front with Kaveri. By giving it a GCN graphics backbone, features from the main GPU line can fully integrate (with HSA) into the APU. As we have seen in previous AMD releases and talks, this means several things:
- Mantle
- AMD TrueAudio
- PCIe Gen 3
AMD is wanting to revolutionize the way that games are played and shown with Mantle – it is a small shame that the Mantle release was delayed and that AMD did not provide any numbers to share with us today. The results should find their way online after release however.
Prong 4: Power Optimisations
With Richland we had CPUs in the range of 65W to 100W, and using the architecture in the FX range produced CPUs up to 220W. Techincally we had 45W Richland APUs launch, but to date I have not seen one for sale. However this time around, AMD are focusing a slightly lower power segment – 45W to 95W. Chances are the top end APUs (A10-7850K) will be 95W, suggesting that we have a combination of a 20% IPC improvement, 400 MHz decrease but a 5% TDP decrease for the high end chip. Bundle in some HSA and let’s get this thing on the road.
Release Date
AMD have given us the release date for the APUs: January 14th will see the launch of the A10-7850K and the A10-7700K. Certain system builders should be offering pre-built systems based on these APUs from today as well.
133 Comments
View All Comments
artk2219 - Tuesday, January 7, 2014 - link
Whenever anything graphics related comes into play. The I5 is a great CPU with a very meh igpu, and terrible drivers to boot so that isn't helping anything, Richland currently smashes HD 4600, Kaveri will pull ahead even more so. The only thing Intel has thats faster than Richland is iris pro, and that only comes on $400+ processors. Plus its easy to be fast when you have 64 to 128mb of on die memory.theduckofdeath - Saturday, January 18, 2014 - link
If you're buying an overclockable Intel processor, it is not very likely that you'll stick with the integrated GPU, and when you stick a dedicated GPU into both systems, the Intel outperforms the AMD with one hand tied to its back.aryonoco - Monday, January 6, 2014 - link
AT badly needs a flag/report button, or at least an ignore one.It's one of the few websites where I actually read the comments cause they can, on occasion, be useful and informative, but when things like Intellover happen, it becomes very difficult to stay focused on the conversation.
Flunk - Monday, January 6, 2014 - link
Well, it's certainly an interesting way to go. It could be the way everything ends up in the future. We'll see when the final silicon is tested but I suspect that it will end up slotting in under Intel's i7 line somehow.Mathos - Monday, January 6, 2014 - link
Most likely will, these are main stream APU's not high end desktop enthusiast CPU's. Though, I really want to see how well the 20% x86 ipc holds up. IF they managed a solid 20% ipc improvement across the board, it could make it worthwhile for them to release an AM3+ replacement for the current FX series. It would at least put their CPU's real close to intels on ipc again.cmikeh2 - Tuesday, January 7, 2014 - link
Their relatively recently leaked roadmap (of course it could be wrong) shows them not releasing any new FX processors in 2014.silverblue - Tuesday, January 7, 2014 - link
It's very much open to interpretation. If Steamroller is 20% faster per clock, is this single or multithreaded? If it's 20% faster per core, does this mean that despite the 400MHz drop, each core is 20% faster than Richland? One thing that appears to be set in stone is that the improvement is in the x86 hardware, regardless of HSA/hUMA and what-have-you.If we end up with a situation where Kaveri is 20% faster per clock, the A10-7850K will beat the A10-6800K by barely 10%, assuming the 20% is a realistic average. The downclocking of the Kaveri range to bring down power usage makes sense with a significantly larger GPU than before; I'd be very much interested in seeing the power usage of the A10-7700K considering that it allegedly has 384 GPU cores.
I believe that Steamroller's main improvement in CPU terms is removing the multithreading bottleneck, something that has hampered the Bulldozer architecture. A 30% performance boost chip-wise over the original FX series (the FX-4130 may be an ideal comparison - 3.8GHz with 3.9GHz turbo) would be a decent jump over the course of two years.
If we were to expect a 20% jump over Piledriver (still mainly multithreaded, I expect), then the following comparison may be of some use:
http://www.anandtech.com/bench/product/675?vs=677
Kaveri's aim should be to compete with the top i3s; each has four threads, the former having smaller cores with a view to parallelism and the latter having two fat cores with an ability to utilise them fully. I don't think we should expect Steamroller to compete with the i5s on a CPU core-for-core basis (especially not FP), and certainly not until Excavator. A lack of L3 will always hurt in such comparisons anyway.
Death666Angel - Tuesday, January 7, 2014 - link
20% better IPC = instructions per clock. Got your answer in the article. :-)silverblue - Wednesday, January 8, 2014 - link
It looks like an inference by the author as opposed to anything else. We'll see in a week! :)Notmyusualid - Tuesday, January 7, 2014 - link
Really, I shake my head, when I see all the AMD-haters on forums like this.Are their CPUs REALLY that bad. I mean, honestly?
I think not. There are positives and negatives on each side of the equation. I admit Intel have an edge in performance (especially in the mobile parts), but are they competitive? I think so.
My buddy has an AMD 6-core cpu (forgotten the actual cpu gen code), and it is fast. He even managed to drop it into an old motherboard too, which I thought was cool also.
He is not cracking passwords, nor playing games, but does some photographic image work, browsing, music & Bluray playback. No complaints at all.
Let us try to think back to early 2000's, and they performance we had available then. Now THAT was slow, not what AMD are putting out today.
So he is happy, I'm happy he is happy, so why all the tears?
I'm not a fan-boi either way, I have dual 7970s, and on my 3rd Intel i7 Extreme in a row. But I think some of you here need a life.
Flame-away...