Mobile Ivy Bridge HD 4000 Investigation: Real-Time iGPU Clocks on ULV vs. Quad-Core
by Jarred Walton on June 1, 2012 1:25 AM EST- Posted in
- Laptops
- Intel
- Ivy Bridge
- graphics
- Ultrabook
Earlier today we posted our review of Intel’s Ivy Bridge ULV Ultrabook. In it, we found that while the maximum iGPU clock on the ULV HD 4000 is 1150MHz—just 8% lower than the maximum iGPU clock on the quad-core HD 4000—performance in most games is significantly lower. We postulated that the reduced thermal headroom created by the 17W TDP was at least partially to blame, but we didn’t have time to really look into the matter. We received several requests to look into the matter further, with the suggestion of using GPU-Z logging to capture the real-time iGPU clocks. With those logs now in hand, we’re ready to present additional information that explains why we’re seeing larger performance discrepancies despite the relatively minor <10% iGPU clock speed difference.
We didn’t run tests on our entire gaming suite, but we did investigate iGPU clocks on six of the eight games. The other two games that we skipped are Portal 2 and Skyrim, mostly because the pattern we’ll discuss is already quite clear. For each game, with the exception of Diablo III, we ran through our benchmark sequence while logging the iGPU clocks using GPU-Z. In order to make the charts presentable and easier to compare, we’ve trimmed out small amounts of data so that for each game on each laptop, we’ve got the same amount of data. Note that we did check average frame rates with and without the extra frames and made sure that removing a few seconds of logging didn’t make more than a 1% difference—and it was typically more like less than a 0.1% difference.
What about Diablo III? Since there’s not a great way to do a consistent benchmark sequence—the game world is randomly generated within certain bounds each time you play, at least for areas with combat—we simply played the game for around 15 minutes while collecting iGPU clock data. This includes killing monsters (with several larger mobs on each system), visiting town (in Act 2), checking out inventory, etc. As with the other games, the pattern of iGPU clock speed is pretty clear.
While we’re talking iGPU clock speeds, it also helps to keep average frame rates in mind. Let’s start with a recap of gaming performance for our two test laptops (leaving out Diablo III since benchmark results are from slightly different areas).
* i7-3720QM tested with 2696 driver; i5-3427U with 2725 driver.
Portal 2 Performance is signficantly improved with 2725.
As noted in our Ultrabook review, quad-core Ivy Bridge performs substantially better in games than ULV dual-core Ivy Bridge. This isn’t too surprising, as we’ve see the same thing with HD 3000 in Ultrabooks compared to dual- and quad-core notebooks. Our assumption has always been that the lowered iGPU clocks in Ultrabooks was the primary culprit, but with Ivy Bridge the maximum clock speeds aren’t all that different—i7-3720QM can hit a maximum iGPU clock that’s just 9% higher than i5-3427U. When it comes to actual gaming performance, however, quad-core IVB ends up 35% faster on average—and that’s not even accounting for the substantially improved Portal 2 performance thanks to a new driver on the Ultrabook.
The question then is whether we’re hitting thermal constraints on IVB ULV, and are we GPU or CPU limited—or maybe even both. Before we get to the detailed graphs, here’s the quick summary of the average HD 4000 clocks on the Ivy Bridge Ultrabook (i5-3427U) and the ASUS N56VM (i7-3720QM).
So, that makes things quite a bit easier to understand, doesn’t it? During actual game play (e.g. when the iGPU isn’t idle and clocked at its minimum 350MHz), the quad-core IVB laptop is able to run at or very close to its maximum graphics turbo clock of 1250MHz, and as we’ll see in a moment it never dropped below 1200MHz. By comparison, in most games the ULV IVB laptop averages clock speeds that are 50-150MHz lower than its maximum 1150MHz turbo clock. With that overview out of the way, here are the detailed graphs of iGPU clocks for the tested games.
The GPU clocks over time tell the story even better. The larger i7-3720QM notebook basically runs at top speed in every game we threw at it, other than a few minor fluctuations. With the smaller ULV i5-3427U, we see periodic spurts up to the maximum 1150MHz clock, but we also see dips down as low as 900MHz. (We’re not counting the bigger dips to 350MHz that happen in Batman during the benchmark scene transitions, though it is interesting that those don’t show up on 4C IVB at all.)
After performing all of the above tests, we end up with an average iGPU clock on the i5-3427U around 1050MHz compared to 1250 on the larger notebook. That represents a nearly 20% performance advantage on the GPU clocks, but in the actual games we’re still seeing a much greater advantage for the quad-core chip. Given that most games don’t actually use much more than two CPU cores (if that), even in 2012, the most likely explanation is the higher clock speeds of the quad-core processor. However, from what we’ve seen of the ULV iGPU clocks under load, it seems likely that not only is the quad-core chip clocked higher, but it’s also far more likely to be hitting higher CPU Turbo Boost clock speeds.
It’s at this point that I have to admit I decided to wrap things up without doing a ton more testing. Having already run a bunch of tests with GPU-Z logging iGPU clocks in the background, I now wanted to find a way to log the CPU core clocks in a similar fashion. CPU-Z doesn’t support logging, and TMonitor doesn’t work with Ivy Bridge, so I needed to find a different utility. I eventually found HWiNFO64, which did exactly what I wanted (and more)—though I can’t say the UI is as user friendly as I’d like.
The short story is that if I had started with HWiNFO64, I could have gathered both CPU and GPU clocks simultaneously, which makes the charts more informative. Since we’re dealing with dual-core and quad-core processors, I have two lines in each chart: Max CPU is the highest clocked CPU core at each measurement point, while Avg CPU is the average clock speed across all two/four cores. There’s not always a significant difference between the two values, but at least on the quad-core IVB we’ll see plenty of times where the average CPU clock is a lot lower than the maximum core clock. Besides the CPU clocks), we again have the GPU clock reported.
As mentioned just a minute ago, I got tired of running these tests at this point and figured I had enough information, so I just reran the DiRT 3 and Diablo III tests. DiRT 3 is one of the worst results for IVB ULV compared to IVB 4C, while Diablo III is more in line with the difference in iGPU clocks (e.g. the quad-core notebook is around 20-25% faster). So here are the final two charts, this time showing CPU and GPU clocks.
As before, the quad-core notebook runs HD 4000 at 1250MHz pretty much the entire time. Not only does the iGPU hit maximum Turbo Boost, but the CPU is likewise running at higher Turbo modes throughout the test. i7-3720QM can turbo up to a maximum clock speed of 3.6GHz on just one core, and the average of the “Max CPU” clock ends up being 3442MHz in DiRT 3 and 3425MHz in Diablo III.
i5-3427U also has a decent maximum CPU Turbo Boost clock of 2.8GHz, but it rarely gets that high. Diablo III peaks early on (when the game is still loading, actually) and then quickly settles down to a steady 1.8GHz—the “guaranteed CPU clock”, so no turbo is in effect. The overall average “Max CPU” clock in Diablo III is 1920MHz, but most of the higher clocks come at the beginning and end of the test results when we’re in the menu or exiting the game. DiRT 3 has higher CPU clocks than Diablo III on average—2066MHz for the “Max CPU”—but the average iGPU clock is slightly lower.
Interestingly, HWiNFO also provides measurements of CPU Package Power (the entire chip), IA Cores Power (just the CPU), and GT Cores Power (just the iGPU). During our test runs, Diablo III on the ULV Ultrabook shows an average package power of 16.65W and a maximum package power of 18.75W (exceeding the TDP for short periods), with the CPU drawing an average of 3.9W (6.22W max) and the iGPU drawing 9.14W on average (and 10.89W max)—the rest of the power use presumably goes to things like the memory controller and cache. The DiRT 3 results are similar, but with a bit more of the load shifted to the CPU: 15.73W package, 4.39W CPU, and 8.3W iGPU (again with maximum package power hitting 18.51W briefly). For the N56VM/i7-3720QM, the results for Diablo III are: 30.27W package, 12.28W CPU, 13.43W iGPU (and maximum package power of 34.27W). DiRT 3 gives results of 32.06W package, 13.2W CPU, and 14.8W iGPU (max power of 38.4W).
Wrap-Up
Most of this shouldn’t come as a surprise. In a thermally constrained environment (17W for the entire package), it’s going to be difficult to get higher performance from a chip. If you start from square one trying to build a chip for a low power environment (e.g. for a tablet or smartphone) and scale up, you can usually get better efficiency than if you start out with a higher power part and scale down—the typical range of scaling is around an order of magnitude—but if you need more performance you might fall short. The reverse also holds: starting at the top and scaling down on power and performance, you might eventually come up short if you need to use less power.
As far as Ivy Bridge goes, HD 4000 can offer relatively competitive performance, but it looks like it needs 10-15W just for the iGPU to get there. On a 45W TDP part, that’s no problem, but with ULV it looks like Ivy Bridge ends up in an area where it can’t quite deliver maximum CPU and iGPU performance at the same time. This generally means iGPU clocks will be closer to 1000MHz than 1150MHz, but it also means that the CPU portion of the chip will be closer to the rated clock speed rather than the maximum Turbo Boost speed. One final item to keep in mind is just how much performance we’re getting out of a chip that uses a maximum of 17W. ULV IVB isn’t going to offer gaming performance comparable to an entry level graphics solution, but then even the low-end discrete mobile GPUs often use 25W or more. Cut the wattage in half, and as you’d expect the performance suffers.
So how much faster can we get with ULV chips, particularly with regards to gaming? Intel has a new GPU architecture with Ivy Bridge that represents a significant update from the HD 3000 iGPU, but they’re still trailing AMD and NVIDIA in the graphics market. Their next architecture, Haswell, looks to put even more emphasis on the iGPU, so at least on higher TDP chips we could very well see as much as triple the performance of HD 4000 (if rumors are to be believed). How will that fit into ULV? Even if ULV Haswell graphics are only half as fast as full voltage chips, they should still be a decent step up from the current full voltage HD 4000 performance, which seems pretty good. Too bad we’ll have to wait another year or so to see it!
As for AMD, they appear to be in much the same situation, only they’ve got better GPU performance in Trinity with less CPU performance. The problem is again trying to get decent performance out of a lower power solution, and while the 25W A10-4655M Trinity part looks quite attractive, the 17W A6-4455M part has to make do with half the CPU and GPU cores. ULV Ivy Bridge is only able to deliver about 70% of the graphics performance of full voltage (45W) Ivy Bridge, but I don’t expect ULV Trinity to fare much better. Hopefully we’ll be able to get some hardware in hand for testing to find out in the near future. Unfortunately, at least judging by GPU-Z and HWiNFO results on an A10-4600M, we still don’t have a good way of getting real-time iGPU clocks from Trinity or Llano.
35 Comments
View All Comments
MrSpadge - Saturday, June 2, 2012 - link
For Brazos there's Brazos Tweaker. However, there's not much juice left in this CPU. For Intel we had RMClock back in the C2D days. I used it to lower the real world full CPU load power consumption of 2 of my CPUs by 10 W each, from ~35 W to 25 W (for the entire machine). That was pretty significant, especially since it drove the fan down from speed 3 to speed 1 (lowest).I guess Intel is setting the voltage a bit more tight on current products, since they're now completely power consumption constrained.. but I'm sure there'll be some reserve left.
Oh, I think Throttle Stop /or so) was trying to implement customizable voltages on core i.. not sure how successful, though.
JarredWalton - Saturday, June 2, 2012 - link
ThrottleStop does not work for setting voltages, at least not on the laptops where I've installed it -- that goes for Sandy Bridge as well as Ivy Bridge; I never tried with Arrandale. The problem is I think many laptops use either non-programmable voltage controllers, or they're programmable but completely undocumented so unless you know exactly how to talk to each and every laptop it won't work.manguszta - Friday, June 1, 2012 - link
Excellent article.I'm wondering what the results would be with an i7 which has the same 650-1250 MHz frequency range for graphics as the 3720QM, however its TDP is 35W and it is not a quad core processor. I'm talking about the i7 3520M.
I agree that most games are perfectly happy with two cores, so how the 3520M would score against the tested ULV and quadcore chip?
JarredWalton - Friday, June 1, 2012 - link
Given the results with the i5-2410M and i5-2520M vs. i7-2820QM, performance should be very similar. There are a few instances where the larger L3 cache will help on the quad-core to the tune of 5-10%, but with 35W TDP and only two CPU cores we rarely saw issues on Sandy Bridge and I expect Ivy Bridge will be the same. We'll know for certain when we get one of the 35W dual-core chips in a review laptop. :-)IntelUser2000 - Friday, June 1, 2012 - link
Considering that on Linux that ENABLING the L3 cache use for Sandy Bridge iGPU resulted in only 15-20% improvement, I doubt you'd see that much of a difference due to L3 cache differences. 5-10% would require significant difference in hit rates, like going from a 100% hit rate down to a 50%.Plus, Ivy Bridge's iGPU has its own L3 cache, further reducing the impact.
Jamahl - Friday, June 1, 2012 - link
Almost like an old AT article back when it was worth reading. Well done...and more of this in future please.tuxRoller - Friday, June 1, 2012 - link
Does anyone else recall Intel speaking about the potential of mobile IVB, specifically, how, when combined with a cooling dock, the package would be able to stay at Turbo for longer periods?Was that Intel making hypothetical statements, or have some oems expressed interest in doing this?
MrSpadge - Saturday, June 2, 2012 - link
Configurable TDP is there, but it's up to the OEMs to enable this feature.secretmanofagent - Friday, June 1, 2012 - link
Silly question: what does the current MacBook Air have, a 17W or a 35W? I'm just curious, I'm due to replace my five year old Mac with a new one. If they're currently 17W, then I'll probably hold off until next year for Haswell.noeldillabough - Saturday, June 2, 2012 - link
Pretty sure 17w, my air is much slower than my x220 (35w)