Intel Disables TSX Instructions: Erratum Found in Haswell, Haswell-E/EP, Broadwell-Y
by Ian Cutress on August 12, 2014 8:20 PM ESTOne of the main features Intel was promoting at the launch of Haswell was TSX – Transactional Synchronization eXtensions. In our analysis, Johan explains that TSX enables the CPU to process a series of traditionally locked instructions on a dataset in a multithreaded environment without locks, allowing each core to potentially violate each other’s shared data. If the series of instructions is computed without this violation, the code passes through at a quicker rate – if an invalid overwrite happens, the code is aborted and takes the locked route instead. All a developer has to do is link in a TSX library and mark the start and end parts of the code.
News coming from Intel’s briefings in Portland last week boil down to an erratum found with the TSX instructions. Tech Report and David Kanter of Real World Technologies are stating that a software developer outside of Intel discovered the erratum through testing, and subsequently Intel has confirmed its existence. While errata are not new (Intel’s E3-1200 v3 Xeon CPUs already have 140 of them), what is interesting is Intel’s response: to push through new microcode to disable TSX entirely. Normally a microcode update would suggest a workaround, but it would seem that this a fundamental silicon issue that cannot be designed around, or intercepted at an OS or firmware/BIOS level.
Intel has had numerous issues similar to this in the past, such as the FDIV bug, the f00f bug and more recently, the P67 B2 SATA issues. In each case, the bug was resolved by a new silicon stepping, with certain issues (like FDIV) requiring a recall, similar to recent issues in the car industry. This time there are no recalls, the feature just gets disabled via a microcode update.
The main focus of TSX is in server applications rather than consumer systems. It was introduced primarily to aid database management and other tools more akin to a server environment, which is reflected in the fact that enthusiast-level consumer CPUs have it disabled (except Devil’s Canyon). Now it will come across as disabled for everyone, including the workstation and server platforms. Intel is indicating that programmers who are working on TSX enabled code can still develop in the environment as they are committed to the technology in the long run.
Overall, this issue affects all of the Haswell processors currently in the market, the upcoming Haswell-E processors and the early Broadwell-Y processors under the Core M branding, which are currently in production. This issue has been found too late in the day to be introduced to these platforms, although we might imagine that the next stepping all around will have a suitable fix. Intel states that its internal designs have already addressed the issue.
Intel is recommending that Xeon users that require TSX enabled code to improve performance should wait until the release of Haswell-EX. This tells us two things about the state of Haswell: for most of the upcoming LGA2011-3 Haswell CPUs, the launch stepping might be the last, and the Haswell-EX CPUs are still being worked on. That being said, if the Haswell-E/EP stepping at launch is not the last one, Intel might not promote the fact – having the fix for TSX could be a selling point for Broadwell-E/EP down the line.
For those that absolutely need TSX, it is being said that TSX can be re-enabled through the BIOS/firmware menu should the motherboard manufacturer decide to expose it to the user. Reading though Intel’s official errata document, we can confirm this:
We are currently asking Intel what the required set of circumstances are to recreate the issue, but the erratum states ‘a complex set of internal timing conditions and system events … may result in unpredictable system behaviour’. There is no word if this means an unrecoverable system state or memory issue, but any issue would not be in the interests of the buyers of Intel’s CPUs who might need it: banks, server farms, governments and scientific institutions.
At the current time there is no road map for when the fix will be in place, and no public date for the Haswell-EX CPU launch. It might not make sense for Intel to re-release the desktop Haswell-E/EP CPUs, and in order to distinguish them it might be better to give them all new CPU names. However the issue should certainly be fixed with Haswell-EX and desktop Broadwell onwards, given that Intel confirms they have addressed the issue internally.
Source: Twitter, Tech Report
63 Comments
View All Comments
barleyguy - Thursday, August 14, 2014 - link
That's not an equal analogy, because hardware AES is a huge performance gain, and software AES isn't difficult (just use a peer reviewed open source library at the 128 byte level and wrap it in some multiplexing code). There is no "hard" way to do AES in software that's as fast as the hardware instructions, AFAIK.TSX is also a huge performance gain over the "easy" method of using blocking locks, but likely has comparable performance to the "hard" way of compare and set. So in that sense, it's less of a real gain, assuming of course that you're not the person paying the developers.
psyq321 - Wednesday, August 13, 2014 - link
It is not a convinience feature. In fact, it is >easier< to write the code without it.Have a potentially contended data (between threads)? Just lock the sucker, that's the easiest way.
But that is not the most efficient way. Basically, what TSX does is, it relies on CPU smarts to speed-up multi-threaded code >without< having to resort to even more complex (and error-prone) lock-free programming. In that way, one can call TSX a "convenience', but in reality it does require additional work, just not that much additional work.
TSX is roughly comparable to, say, SSE instructions (but it is not nearly as useful in terms of potential applications). In order to use SSE with some decent speedup, you have to do a bit more efforts in your code, so it is not really a "convenience", as it require developer to do more work, in order to achieve faster code execution.
eachus - Tuesday, August 19, 2014 - link
The issue of how to do multi-processor locks is complex. Until a decade ago, you used a semaphore or mutex, and took the timing hit of marking the lock as uncacheable. Hundreds of clocks even if you weren't modifying the lock. (Usually you would use a RMW read-modify-write instruction to put the identity of your thread in the lock and check if the lock was not reserved by comparing the returned value to zero, or -1 or whatever.) When you release the lock, you first have to check if it still contains the id that you put in, then either relase the lock or turn control over to a waiting thead. (I'm simplifying a lot, so don't shoot me.) Anyway, three main memory reads, or RMW cycles in the best case.Then along came Opteron. Opterons, with all IO passing through a CPU chip, and cache-coherency connections to all other CPU chips, meant that requests for uncacheable memory could be ignored. The locks worked as before, but now you could have a fast (possibly 3 CPU clock latency) read or RMW cycle. I never measured 100x performance improvements unless thrashing was involved, but that tells the real story. It was no longer about how slow uncached memory was, but about how much more work your database or other application could do before it hit thrashing. Many times I could wind the CPUs up to 100% load for minutes at a time without starting to thrash, which was a very good thing. (Some other CISC and RISC CPUs also support/supported IOMMUs. Most that didn't are now dead.)
When Intel added IOMMU support to their x86/x64 CPUs they didn't duplicate what AMD had done. (And now works on all AMD CPU including single socket desktop chips, and ARM64 chips.) This is because AMD uses a MOESI protocol and Intel uses MESIF. Again way too much detail for here, but it means that in certain locking cases, you get a (relatively) slow ping-pong effect when two threads are sequentially accessing a lock. (Think producer/consumer.) By treating the transaction as speculative, and never touching the lock if there is no conflict, overall transactions are sped up.
AMD proposed a similar instruction set extension (ASF) in 2009, but AFAIK the best lcoking code was not significantly improved (on AMD CPUs) and the proposal has languished. Will this bug kill TSX? Probably not. There is an awful lot of ancient history embedded in the x86 ISA, this will just be a bit more. But I expect the effect on potential users to be the same as AMD's ASF leading programmers to better lock implementations.
ABR - Wednesday, August 13, 2014 - link
How do they "push a microcode update" to a CPU that's already out in the wild?ObstinateMuon - Wednesday, August 13, 2014 - link
The same way they did last time. Via the universal backdoor in Windows.Alexvrb - Wednesday, August 13, 2014 - link
BIOS updates dude. :-/ObstinateMuon - Wednesday, August 13, 2014 - link
You're right. It doesn't matter which OS you run. They can instead choose to do it remotely via AMT. See http://www.fsf.org/blogs/community/active-manageme...The_Assimilator - Wednesday, August 13, 2014 - link
Please stop drinking the Stallman kool-aid, it just makes it obvious that you're an idiot.maxpwr - Wednesday, August 13, 2014 - link
There is a reason why Russia is designing a new chip to replace all Intel CPUs.psyq321 - Wednesday, August 13, 2014 - link
No, the reason Russia is designing a new chip is because they want to protect their market (protect, as in economic protection, not security).Russia is not designing a new CPU architecture anyway, they will use ARM architecture. Now, if you think somebody can spot a deliberate security flaw in a CPU design consisting of hundreds of millions of blocks, yeah... good luck with that. The only way to be 100% sure is to design it by yourself from scratch, and even that does not guarantee it won't have flaws that can be silently exploited.
In any case, Intel's Microcode has nothing whatsoever to do with this. You can simply prevent any theoretical possibility that somebody can use Intel AMT against you by simply not giving it access to public Internet.
Not to mention that Microcode updates are not persistent, and have to be applied after every power-on. If you do not allow BIOS upgrades and control the OS and do not allow public Internet access to the system firmware, there is simply no way somebody can exploit your CPU remotely.