Given the timing of yesterday's Cortex A53 based Snapdragon 410 announcement, our latest Ask the Experts installment couldn't be better. Peter Greenhalgh, lead architect of the Cortex A53, has agreed to spend some time with us and answer any burning questions you might have on your mind about ARM, directly.

Peter has worked in ARM's processor division for 13 years and worked on the Cortex R4, Cortex A8 and Cortex A5 (as well as the ARM1176JZF-S and ARM1136JF-S). He was lead architect of the Cortex A7 and ARM's big.LITTLE technology as well. 

Later this month I'll be doing a live discussion with Peter via Google Hangouts, but you guys get first crack at him. If you have any questions about Cortex A7, Cortex A53, big.LITTLE or pretty much anything else ARM related fire away in the comments below. Peter will be answering your questions personally in the next week.

Please help make Peter feel at home here on AnandTech by impressing him with your questions. Do a good job here and I might be able to even convince him to give away some ARM powered goodies...

Comments Locked

158 Comments

View All Comments

  • Doormat - Tuesday, December 10, 2013 - link

    ARM CPU vendors (Qualcomm, Nvidia, etc) seem to be choosing slower quad core over faster dual core, and I'm suspecting its all a marketing game (e.g. more cores is better, see Motorola's X8 announcement of an "8 core" phone). Do those non-technical decisions impact the decisions of the engineers in developing the ARM architecture?
  • dishayu - Wednesday, December 11, 2013 - link

    Care to present some examples, please?

    NVidia used 4+1 A15 cores (fastest available at the time) for Tegra 4. And Qualcomm doesn't use generic ARM cores. They have their own (krait) architecture and the most popular SoCs based on their fastest architectures (krait 300/400) are almost exclusively quad-core.
  • Peter Greenhalgh - Wednesday, December 11, 2013 - link

    Hi Doormat,

    You are quite correct that there are a variety of frequencies and core-counts being offered by ARM partners. However, for ARM design micro-architectures these do not have an effect on micro-architectures as we must be able to support a variety of target frequencies and core-counts across many different process geometries.
  • Factory Factory - Tuesday, December 10, 2013 - link

    How does designing a CPU "by hand" differ from using an automated layout tool? What sort of trade-offs does/would using automated tools cause for ARM's cores?

    Second question: With many chips from many manufacturers now implementing technologies like fine-grained power gating, extremely fine control of power and clock states, and efficient out-of-order execution pipelines, where does ARM go from here to keep its leadership in low-power compute IP?
  • Peter Greenhalgh - Wednesday, December 11, 2013 - link

    Hi Factory,

    Hand layout versus automated layout is an interesting trade-off. From one perspective, full hand-layout for all circuits in a processor is rarely used now. Aside from cache RAMs which are always custom, hand-layout is reserved for datapath and queues which are regular structures that allow a human to spot the regularity and ‘beat’ an automated approach. However, control logic is not amenable to hand-layout as it’s very difficult to beat automated tools which means that the control logic can end up setting the frequency of the processor without significant effort.

    In general the benefit from hand-layout has been reducing in recent years. Partly this is due to the complexity of the design rules for advanced process generations reducing the scope for more specific circuit tuning techniques to be used. But another factor is the development of advanced standard cell libraries that have a large variety of cells and drive strengths which lessens the need for special circuit techniques. When we’re developing our processors we’re fortunate to have access to a large team in ARM designing standard cell libraries and RAMs who can advise us about upcoming nodes (for example 16nm and 10nm). In turn the processor teams can suggest & trial new advanced cells for the libraries which we call POPs (Processor Optimization Packages) that improve frequency, power and area.

    A final trade-off to consider is process portability. After an ARM processor is licensed we see it on many different process geometries which is only possible because the designs are fully synthesizable. For example, there are Cortex-A7 implementations on all the major foundries from 65nm to 16nm. In combination with the advanced standard cell libraries for these processes there is little need to go to a hand-layout approach and we instead enable our partners to get to market more rapidly on the process geometry and foundry of their choosing.
  • mrdude - Tuesday, December 10, 2013 - link

    A few questions:

    When can we expect an end to software based dvfs scaling? It seems to me to be the biggest hurdle in the armsphere towards higher single threaded performance.

    the current takes on your big.little architecture have been somewhat suboptimal (the exynos cache flush as an example), so what can we expect from arm themselves to skirt/address these issues? It seems to me to be a solid approach given the absolutely miniscule power and die budget that your little cores occupy, but there's still the issues of software and hardware implementation before it becomes widely accepted.

    Though this question might be better posited for the gpu division, are we going to be seeing unified memory across the gpu and CPU cores in the near future? Arm joining hsa seems to point to a more coherent hardware architecture and programming emphasis

    Pardon the grammatical errors as IM typing this on my phone. big thanks to Anand and peter.
  • Peter Greenhalgh - Wednesday, December 11, 2013 - link

    Hi Mrdude,

    While there are platforms that use hardware event monitors to influence DVFS policy, this is usually underneath a Software DVFS framework. Software DVFS is powerful in that it has a global view of all activity across a platform in time whereas Hardware DVFS relies on building up a picture from lots of individual events which have little to no relationship with one another. As an analogy, Software DVFS is like directing traffic from a helicopter with a very clear view of what is going on all roads in a city (but greater latency when forcing a change), whereas Hardware DVFS is like trying to pool information from hundreds of traffic cops all feeding traffic information in from their street corner. A traffic cop might be able to change traffic flow on their street corner, but it may not be the best policy for the traffic in the city!

    Like all things in life, there are trade-offs with neither approach being absolutely perfect in all situations and hardware DVFS solutions rely on the Software DVFS helicopter too.
  • nafhan - Tuesday, December 10, 2013 - link

    This may not be something you can answer, but is there a timeline for a 64 bit follow on to Krait?

    Also, do you have any thoughts regarding clock speed vs. instruction width scaling and which route Qualcomm plans to take (with Apple going the instruction width route with the A7 and Qualcomm currently going the clock speed route with recent SoC's)/
  • coder543 - Tuesday, December 10, 2013 - link

    ARM != Qualcomm. Qualcomm designs their own stuff, this guy is from ARM. Even if he knew the answers to those questions, they're neither on topic, nor is he at liberty to discuss them. He probably doesn't even want to talk about that, considering Qualcomm isn't exactly giving ARM any compliments by throwing out all of ARM's work and starting from scratch.
  • nafhan - Tuesday, December 10, 2013 - link

    I just finished reading the Snapdragon 410 article, and I thought I read Qualcomm in here somewhere... you are absolutely correct.

    Still, I don't think licensing ARM's ISA (a la Krait) is an insult to ARM. That's a big part of their business model.

Log in

Don't have an account? Sign up now