CPU Performance: SPEC2006

SPEC2006 has been a natural goal to aim for as a keystone analysis benchmark as it’s a respected industry standard benchmark that even silicon vendors use for architecture analysis and development. As we saw SPEC2017 released last year SPEC2006 is getting officially retired on January 9th, a funny coincidence as we now finally start using it.

As Android SoCs improve in power efficiency and performance it’s now becoming more practical to use SPEC2006 on consumer smartphones. The main concerns of the past were memory usage for subtests such as MCF, but more importantly sheer test runtimes for battery powered devices. For a couple of weeks I’ve been busy in porting over SPEC2006 to a custom Android application harness.

The results are quite remarkable as we see both the generational performance as well as efficiency improvements from the various Android SoC vendors. The Kirin 970 in particular closes in on the efficiency of the Snapdragon 835, leapfrogging the Kirin 960 and Exynos SoCs. We also see a non-improvement in absolute performance as the Kirin 970 showcases a slight performance degradation over the Kirin 960 – with all SoC vendors showing just meagre performance gains over the past generation.

Going Into The Details

Our new SPEC2006 harness is compiled using the official Android NDK. For this article the NDK version used in r16rc1 and Clang/LLVM were used as the compilers with just the –Ofast optimization flags (alongside applicable test portability flags). Clang was chosen over of GCC because Google has deprecated GCC in the NDK toolchain and will be removing the compiler altogether in 2018, making it unlikely that we’ll revisit GCC results in the future. It should be noted that in my testing GCC 4.9 still produced faster code in some SPEC subtests when compared to Clang. Nevertheless the choice of Clang should in the future also facilitate better Androids-to-Apples comparisons in the future. While there are arguments that SPEC scores should be published with the best compiler flags for each architecture I wanted a more apples-to-apples approach using identical binaries (Which is also what we expect to see distributed among real applications). As such for this article the I’ve chosen to pass to the compiler the –mcpu=cortex-a53 flag as it gave the best average overall score among all tested CPUs. The only exception was the Exynos M2 which profited from an additional 14% performance boost in perlbench when compiled with its corresponding CPU architecture target flag.

As the following SPEC scores are not submitted to the SPEC website we have to disclose that they represent only estimated values and thus are not officially validated submissions.

Alongside the full suite for CINT2006 we are also running the C/C++ subtests of CFP2006. Unfortunately 10 out of the 17 tests in the CFP2006 suite are written in Fortran and can only be compiled with hardship with GCC on Android and the NDK Clang lacks a Fortran front-end.

As an overview of the various run subtests, here are the various application areas and descriptions as listed on the official SPEC website:

SPEC2006 C/C++ Benchmarks
Suite Benchmark Application Area Description
SPECint2006

(Complete Suite)
400.perlbench Programming Language Derived from Perl V5.8.7. The workload includes SpamAssassin, MHonArc (an email indexer), and specdiff (SPEC's tool that checks benchmark outputs).
401.bzip2 Compression Julian Seward's bzip2 version 1.0.3, modified to do most work in memory, rather than doing I/O.
403.gcc C Compiler Based on gcc Version 3.2, generates code for Opteron.
429.mcf Combinatorial Optimization Vehicle scheduling. Uses a network simplex algorithm (which is also used in commercial products) to schedule public transport.
445.gobmk Artificial Intelligence: Go Plays the game of Go, a simply described but deeply complex game.
456.hmmer Search Gene Sequence Protein sequence analysis using profile hidden Markov models (profile HMMs)
458.sjeng Artificial Intelligence: chess A highly-ranked chess program that also plays several chess variants.
462.libquantum Physics / Quantum Computing Simulates a quantum computer, running Shor's polynomial-time factorization algorithm.
464.h264ref Video Compression A reference implementation of H.264/AVC, encodes a videostream using 2 parameter sets. The H.264/AVC standard is expected to replace MPEG2
471.omnetpp Discrete Event Simulation Uses the OMNet++ discrete event simulator to model a large Ethernet campus network.
473.astar Path-finding Algorithms Pathfinding library for 2D maps, including the well known A* algorithm.
483.xalancbmk XML Processing A modified version of Xalan-C++, which transforms XML documents to other document types.
SPECfp2006

(C/C++ Subtests)
433.milc Physics / Quantum Chromodynamics A gauge field generating program for lattice gauge theory programs with dynamical quarks.
444.namd Biology / Molecular Dynamics Simulates large biomolecular systems. The test case has 92,224 atoms of apolipoprotein A-I.
447.dealII Finite Element Analysis deal.II is a C++ program library targeted at adaptive finite elements and error estimation. The testcase solves a Helmholtz-type equation with non-constant coefficients.
450.soplex Linear Programming, Optimization Solves a linear program using a simplex algorithm and sparse linear algebra. Test cases include railroad planning and military airlift models.
453.povray Image Ray-tracing Image rendering. The testcase is a 1280x1024 anti-aliased image of a landscape with some abstract objects with textures using a Perlin noise function.
470.lbm Fluid Dynamics Implements the "Lattice-Boltzmann Method" to simulate incompressible fluids in 3D
482.sphinx3 Speech recognition A widely-known speech recognition system from Carnegie Mellon University

It’s important to note one extremely distinguishing aspect of SPEC CPU versus other CPU benchmarks such as GeekBench: it’s not just a CPU benchmark, but rather a system benchmark. While benchmarks such as GeekBench serve as a good quick view of basic workloads, the vastly greater workload and codebase size of SPEC CPU stresses the memory subsystem to a much greater degree. To demonstrate this we can see the individual subtest performance differences when solely limiting the memory controller frequency, in this case on the Mate 10 Pro with the Kirin 970.

An increase in main memory latency from just 80ns to 115ns (Random access within access window) can have dramatic effects on many of the more memory access sensitive tests in SPEC CPU. Meanwhile the same handicap essentially has no effect on the GeekBench 4 single-threaded scores and only marginal effect on some subtests of the multi-threaded scores.

In general the benchmarks can be grouped in three categories: memory-bound, balanced memory and execution-bound, and finally execution bound benchmarks. From the memory latency sensitivity chart it’s relatively easy to find out which benchmarks belong to which category based on the performance degradation. The worst memory bound benchmarks include the infamous 429.mcf but alongside we also see 433.milc, 450.soplex, 470.lbm and 482.sphinx3. The least affected such as 400.perlbench, 445.gobmk, 456.hmmer, 464.h264ref, 444.namd, 453.povray and with even 458.sjeng and 462.libquantum slightly increasing in performance pointing out to very saturated execution units. The remaining benchmarks are more balanced and see a reduced impact on the performance. Of course this is an oversimplification and the results will differ between architectures and platforms, but it gives us a solid hint in terms of separation between execution and memory-access bound tests.

As well as tracking performance (SPECspeed) I also included a power tracking mechanisms which relies on the device’s fuel-gauge for current measurements. The values published here represent only the active power of the platform, meaning it subtracts idle power from total absolute load power during the workloads to compensate for platform components such as the displays. Again I have to emphasize that the power and energy figures don't just represent the CPU, but the SoC system as a whole, including interconnects, memory controllers, DRAM, and PMIC overhead.

Alongside the current generation SoCs I also included a few predecessors to be able to track the progress that has happened over the last 2 years in the Android space and over CPU microarchitecture generations. Because the runtime of all benchmarks is in excess of 5 hours for the fastest devices we are actively cooling the phones with an external fan to ensure consistent DVFS frequencies across all of the subtests and that we don’t favour the early tests.

The Kirin 970 - Overview SPEC2006 - The Results
Comments Locked

116 Comments

View All Comments

  • lilmoe - Monday, January 22, 2018 - link

    Unfortunately, they're not "fully" vertical as of yet. They've been held back since the start by Qualcomm's platform, because of licensing and "other" issues that no one seems to be willing to explain. Like Andrei said, they use the lowest common denominator of both the Exynos and Snapdragon platforms, and that's almost always lower on the Snapdragons.

    Where I disagree with Andrei, and others, are the efficiency numbers and the type of workloads used to reach those results. Measuring efficiency at MAX CPU and GPU load is unrealistic, and frankly, misleading. Under no circumstance is there a smartphone workload that demands that kind of constant load from either the CPU or GPU. A better measure would be running a actual popular game for 30 mins in airplane mode and measuring power consumption accordingly, or loading popular websites, using the native browser, and measuring power draw at set intervals for a set period of time (not even a benchmarking web application).

    Again, these platforms are designed for actual, real world, modern smartphone workloads, usually running Android. They do NOT run workstation workloads and shouldn't be measured as such. Such notions, like Andrei has admitted, is what pushes OEMs to be "benchmark competitive", not "experience competitive". Apple is also guilty of this (proof is in the latest events, where they're power deliver can't handle the SoC, or the SoC is designed well above sustainable TDP). I can't stress this enough. You just don't run SPEC and then measure "efficiency". It just doesn't work that way. There is no app out there that stresses a smartphone SoC this much, not even the leading game. In the matter of fact, there isn't an Android (or iPhone) game that saturates last year's flagship GPU (probably not even the year before).

    We've reached a point of perfectly acceptable CPU and GPU performance for flagships running 1080p and 1440p resolution screens at this point. Co-processors, such as the decoder, ISP, DSP and NPU, in addition to software optimization are far, FAR more more important at this time, and what Huawei has done with their NPU is very interesting and meaningful. Kudos to them. I just hope these co-processors are meant to improve the experience, not collect and process private user data in any form.
  • star-affinity - Monday, January 22, 2018 - link

    Just curious about your claims about Apple – so you think it's a design fault? I'm thinking that the problem arise only when the battery has been worn out and a healthy battery won't have the problem of not sustaining enough juice for the SoC.
  • lilmoe - Monday, January 22, 2018 - link

    Their batteries are too small, by design, so that's the first design flaw. But that still shouldn't warrant unexpected slowdowns within 12-18 months of normal usage; their SoCs are too power hungry at peak performance, and the constant amount of bursts was having its tall on the already smaller batteries that weren't protect with a proper power delivery system. It goes both ways.
  • Samus - Monday, January 22, 2018 - link

    Exactly this. Apple still uses 1500mah batteries in 4.7" phones. When more than half the energy is depleted in a cell this small, the nominal voltage drops to 3.6-3.7v from the 3.9-4.0v peak. A sudden spike in demand for a cell hovering around 3.6v could cause it to hit the low-voltage cutoff, normally 3.4v for Li-Ion, and 3.5v for Li-Polymer, to prevent damage to the chemistry the internal power management will shut the phone down, or slow the phone down to prevent these voltage drops.

    Apple designed their software to protect the hardware. It isn't necessarily a hardware problem, it's just an inherently flawed design. A larger battery that can sustain voltage drops, or even a capacitor, both of which take up "valuable space" according to Apple, like that headphone jack that was erroneously eliminated for no reason. A guy even successfully reinstalled a Headphone jack in an iPhone 7 without losing any functionality...it was just a matter of relocating some components.
  • ZolaIII - Wednesday, January 24, 2018 - link

    Try with Dolphine emulator & you will see not only how stressed GPU is but also how much more performance it needs.
  • Shadowfax_25 - Monday, January 22, 2018 - link

    "Rather than using Exynos as an exclusive keystone component of the Galaxy series, Samsing has instead been dual-sourcing it along with Qualcomm’s Snapdragon SoCs."

    This is a bit untrue. It's well known that Qualcomm's CDMA patents are the stumbling block for Samsung. We'll probably see Exynos-based models in the US within the next two versions once Verizon phases out their CDMA network.
  • Andrei Frumusanu - Monday, January 22, 2018 - link

    Samsung has already introduced a CDMA capable Exynos in the 7872 and also offers a standalone CDMA capable modem (S359). Two year's ago when I talked to SLSI's VP they openly said that it's not a technical issue of introducing CDMA and it'll take them two years to bring it to market once they decide they need to do so (hey maybe I was the catalyst!), but they didn't clarify the reason why it wasn't done earlier. Of course the whole topic is a hot mess and we can only speculate as outsiders.
  • KarlKastor - Thursday, January 25, 2018 - link

    Uh, how many devices have shipped yet with the 7872?
    Why do you think they came with a MDM9635 in the Galaxy S6 in all CDMA2000 regions? In all other regions their used their integrated shannon modem.
    The other option is to use a Snapdragon SoC with QC Modem. They also with opt for this alternative but in the S6 they don't wanted to use the crappy Snapdragon 810.

    It is possible, that Qualcomm today skip their politics concerning CDMA2000 because it is obsolete.
  • jjj - Monday, January 22, 2018 - link

    Don't forget that Qualcomm is a foundry customer for Samsung and that could be why they still use it.
    Also, cost is a major factor when it comes to vertical integration, at sufficient scale integration can be much cheaper.
    What Huawei isn't doing is to prioritize the user experience and use their high end SoCs in lower end devices too, that's a huge mistake. They got much lower costs than others in high end and gaining scale by using these SoCs in lower end devices, would decrease costs further. It's an opportunity for much more meaningful differentiation that they fail to exploit. Granted, the upside is being reduced nowadays by upper mid range SoCs with big cores and Huawei might be forced into using their high end SoCs more as the competition between Qualcomm and Mediatek is rather ferocious and upper mid becomes better and better.

    Got to wonder about A75 and the clocks it arrives at ... While at it, I hope that maybe you take a close look at the SD670 when it arrives as it seems it will slightly beat SD835 in CPU perf.

    On the GPU side, the biggest problem is the lack of real world tests. In PC we have that and we buy what we need, in mobile somehow being anything but first is a disaster and that's nuts. Not everybody needs a Ferrari but mobile reviews are trying to sell one to everybody.
  • HStewart - Monday, January 22, 2018 - link

    This could be good example why Windows 10 for ARM will failed - it only works for Qualcomm CPU and could explain why Samsung created Intel based Windows Tablets

    I do believe that ARM especially Samsung has good market in Phone and Tablets - I love my Samsung Tab S3 but I also love my Samsung TabPro S - both have different purposes.

Log in

Don't have an account? Sign up now