We covered the launch of the Calxeda-based Boston Viridis ARM server back in July. The server is makings its appearance at the UK IP EXPO 2012. Boston has been blogging about their work on the Viridis over the last few months, and one of the most interesting aspects is the fact that x86 binary translation now works on the Viridis. The technology is from Eltech, and they have apparently given the seal of approval to the Calxeda platform by indicating that the Boston Viridis was the fastest platform they had tested.

Eltech seems to be doing dynamic binary translation, i.e, x86 binaries are translated on the fly. That makes the code a bit bulky (heavier on the I-Cache). The overhead is relatively large compared to, say, VMware's binary translator (BT) that does x86 to x86, becauseof the necessity to translate between two different ISAs.

Eltech uses a 1 MB translator cache (similar to the translator cache of VMware's BT), which means they can reuse earlier translations. The translation overhead will thus decrease quickly over time if most of the critical loops fit in the translator cache. But it also means that only code with a relatively small footprint will run fast, e.g. get the promised 40-65% of native performance.

Most server applications have a relatively large instruction memory footprint, so it is unclear whether this approach will help to run any heavy server software. Some HPC softwares have a small memory footprint, but since the HPC users tend to pursue performance most of the time, this technology is unlikely to convince them to use ARM servers instead of x86.

In general, the BT software will be useful in the - not uncommon - case where one may have a complex web application comprised of multiple software modules where one small piece of software is not open-source and the vendor does not offer an ARM based binary. So, the Eltech solution does handle a small piece of the puzzle. x86 emulation is thus a nice to have feature, but most ARM based servers will be running fully optimized and recompiled linux software.  That is the target market for products such as the Boston Viridis. 
Comments Locked

14 Comments

View All Comments

  • magnimus1 - Thursday, October 18, 2012 - link

    I don't get it. I understand this is trying to aim for niche low power markets......but what can this do that Atom microservers can't.... and that without the translation.
  • deltatux - Thursday, October 18, 2012 - link

    If I'm not mistaken, ARM chips are generally cheaper to produce and cost less than x86 counterparts like Atom.

    Also, as much as Atom has improved on lowering the power requirements for an x86 processor, Intel still hasn't scaled it to the mW range which many ARM processor designs live...
  • Penti - Thursday, October 18, 2012 - link

    It's not really about power that way, you can get a Atom system to use less. Z2460 cpu core / SoC uses basically 500-750 mW. No real difference there.

    ARM is more flexible here as you can integrate more stuff, choose from more vendors and do more custom stuff. Calxeda (used here) for example has five 10Gb-links integrated into every quad-core chip. It also has management functions built in. It does have a lot more PCIe lanes then Atom cpu's also. So it has a lot more I/O bandwidth and a more flexible fabric than Atom. Plus remember we are talking about out-of-order superscalar speculative issue branch predicting chips with strong FPU to booth. ARM is not ARM7 or ARM11 any more. It's a modern architecture so it can of course compete even if they don't tend to target the high-end. The architecture aren't really any less capable than other high-end architectures and several still competes with Intel. It's a lot about what your "packaging" (integrating) into the SoC, you can also configure it for low-power, slightly higher-powers or higher clocks depending on how you build it. You can always integrate custom stuff to help you out, that can be security, video encoding, ISP, DSP or whatever. The Intel Atom architecture is basically less complex than ARM at the moment. It's still fast but it might not really fit into the fabric of clusters and servers that work together on high-speed I/O for some vendors. AMD goes the other way around and will implement ARM inside some of their x86-64 APU's, other companies beside those who do x86 can't really decide what is integrated with a x86 core or in a x86-SoC.
  • yyrkoon - Friday, October 19, 2012 - link

    "The Intel Atom architecture is basically less complex than ARM at the moment."

    I'd argue that ARM is less complex, but from vendor to vendor, device to device is also more specific purpose. With lots of variety in options between one device to the next. Where ATOM has that "x86 wart" for an instruction set, and is also more general purpose. One size fits all..

    Not to mention the obvious consumer versus enterprise application processor differences.
  • Penti - Friday, October 19, 2012 - link

    Well ARM has a sizable front-end now, multiple FPU's, pretty larges cache and so on. We are not talking about classic-pipelined in-order RISC-cores with less then 100 000 transistors any more. If you take the system approach then you got plenty of parts to help you out that Intel won't have on a server oriented chip. As well as consumer of course, but you get the ISP, video encoder, gpu etc now in their parts for handhelds. Those parts are smaller (die size) than ARM-SoC generally. A Tegra 2 has more than twice the transistors for example over the original Atom with integrated graphics (Lincroft). Just the ARM-core has about as many and in most cases more transistors as the old Atom core there thanks to complex design and large cache. The Cortex A9 excluding L2 cache probably is as large or larger than the Bonnell core excluding cache. It's not like they will be physically larger than ARM-parts. It's all about other power saving design technology, processes and what you target. An A9-core is 26 million transistors add in 50 million transistors for 1MB cache. That is around 102 million for a dualcore without any other logic than the cpu cores. System logic adds to that, gpu adds and so on. High-end ARM-SoC's today use about 0.5B transistors and at least two to three hundred million (I guess) of that is CPU cores/cache in the larger designs.
  • Penti - Friday, October 19, 2012 - link

    This all means that ARM is more a lot more complex than a superscalar PowerPC from a few years ago. A lot larger, or higher transistor count. It's not in where ARM were in the early 90's.
  • patrickjchase - Friday, October 19, 2012 - link

    I disagree with the statement that "The [ARM] architecture aren't really any less capable than other high-end architectures". The current versions (up to and including ARMv7) have one critical liability: They don't support 64-bit virtual addressing. A15 has a PAE-like hack to support >4 GB of *physical* memory, but that's actually far less helpful than you might think.

    Modern programming practices for server applications (database etc) make heavy use of memory-mapped I/O. To do that efficiently with modern data/file sizes you absolutely must have 64-bit virtual addressing. This is true even for systems like Calxeda that have less than 4 GB of physical memory per node.

    In my opinion ARM won't really be a viable competitor in the server space until ARMv8-based CPUs become available in 2014.
  • Penti - Friday, October 19, 2012 - link

    Depends on how you look at it, Cortex-A15 gets LPAE and virtualization features. ARMv8 chips is in the making at both ARM Ltd and custom ones by licensees. Thus it actually supports more memory than Atom. If you talk system wide. I more refer or think of them as computing elements similar to PPC-cores in IBM super computers, small clusters, distributed computing. As well as for workgroup-size stuff. The Avoton-Atom (22nm) and the more pressing Centerton (Saltwell) from Intel will only support 8GB ram (at least now). They are really only competing against AMD. Databases might not be the target use here so, and I wouldn't think it wouldn't become that with ARMv8 neither. Neither do you need to run your database server and workload on either of x86-64 or ARM, you have 2-3 other alternatives. My point is just that it is already more complex than x86-processors from the beginning of 2000's though. Generally supports higher-amount of memory than the average x86 system with early PAE-support. The race is all about software and platform, rather than power usage and transistor count.

    ARM's LPAE aren't really any more useful then it were on 32-bit Xeon servers, but it was even more pressing there as most 32-bit apps only supported addressing 2GB memory on Windows. That doesn't really mean it is like a Netburst based Xeon from the year 2001 or Pentium Pro from 1996 though, in terms of architecture and internals. Many parts of the architecture are actually pretty far ahead here. Some of course aren't, but it's up to the semi companies here on 64-bit cores, several will have their own custom designs and the ISA won't limit, neither will system IP to connect those chips. ARM's own cores will improve but they might not be the ones that actually targets and goes against high-end chips and applications at SoC vendors and customers. But neither is Atom microservers very high-end. Design/silicon wise a modern ARM-core is about as complex as a 64-bit PPC-core with VMX, has similar features and transistor count. But more importantly here todays Atom's are actually of simpler architecture and smaller in size. I really didn't mean to compare it to Haswell-EX or suggest any such thing :) They are large and capable chips by now though. Not small, resource efficient ones that are cheap to manufacture.
  • yyrkoon - Friday, October 19, 2012 - link

    ARM processors are cheaper to manufacture, because compared to an x86 processor, they are relatively simple. This is not to say that one is better than the other, but some ( many? ) feel that the x86 instruction set is not needed, and is very "bloated". Usually though, these types of people are embedded systems designers.

    ARM processors save power by entering several different sleep states, usually while not using any/much processor, and they can switch from one state to the next very quickly. So as an example, it would probably not be uncommon to see code that executes every 100ms, enter into a sleep state after the code has completed, and is waiting for the next code execution cycle.

    Recently, I read an article on a new M0 core ARM MCU, that used 50mW per MIhz. Which for what the processor is, is a bit much if you ask me. At 487Mhz . . . well you do the math. Then consider that *this* processor is nothing compared to what runs in modern phones / tablets. Let alone a server application.

    Really though, it all boils down to "just enough processor". Personally I would rather use ARM processors in many cases. With that said, based on nothing but pure "horsepower". ARM can not hope to keep up. Yet.
  • patrickjchase - Friday, October 19, 2012 - link

    We've been down this road before...

    Cast your mind back to, say, 1993. Back then the RISC camp were beginning to introduce out-of-order designs. Intel had the dual-issue, in-order Pentium. At the time it looked like there was a significant penalty to x86, and people predicted eternal RISC dominance.

    Then a funny thing happened: As the semiconductor processes evolved and more gates became available, the relatively fixed x86 "complex decode" overhead started to vanish relative to all of the other costs of a competitive microarchitecture and x86 caught up, starting with Pentium Pro.

    The embedded/low-power processor space is in basically the same place today as the discrete CPU market was in 1993: ARM (and MIPS) are shipping moderately OoO designs like A9, Krait, and A15. Intel has the static Atom. The RISC camp are winning the power/performance contest at the moment, Intel propaganda notwithstanding.

    That will change as complex decode becomes a smaller portion of the overall processor cost - I expect that the Silvermont Atom will do quite well against its ARM/MIPS contemporaries, particularly when one considers Intel's ~half-node process technology advantage over the likes of TSMC.

    At that point the only thing ARM will have going for them is their business model, specifically the fact that anybody who wants to can buy a license and design their own tailored SoC around ARM cores. Unfortunately the economics of chip design/fabrication are eroding that business model. With each new process generation the fixed cost of a design increases by ~50%, and the number of players that can afford to do their own design decreases accordingly. I suspect that in the long run the Intel model a la Medfield will be perceived as much less onerous than it is today.

Log in

Don't have an account? Sign up now