The Intel SSD DC S3700: Intel's 3rd Generation Controller Analyzedby Anand Lal Shimpi on November 5, 2012 12:01 PM EST
- Posted in
- Cloud Computing
- IT Computing
Today Intel is announcing its first SSD based on its own custom 6Gbps SATA controller. This new controller completely abandons the architecture of the old X25-M/320/710 SSDs and adopts an all new design with one major goal: delivering consistent IO latency.
All SSDs tend to fluctuate in performance as they alternate between writing to clean blocks and triggering defrag/garbage collection routines with each write. Under sequential workloads the penalty isn't all that significant, however under heavy random IO it can be a real problem. The occasional high latency blip can be annoying on a client machine (OS X doesn't respond particularly well to random high IO latency), but it's typically nothing more than a rare hiccup. Users who operate their drives closer to full capacity will find these hiccups to be more frequent. In a many-drive RAID array however, blips of high latency from each drive can destructively work together to reduce the overall performance of the array. In very large RAID arrays (think dozens of drives) this can be an even bigger problem.
In the past, we've recommended simply increasing the amount of spare area on your drive to combat these issues - a sort of bandaid that would allow the SSD controller to better do its job. With its latest controller, Intel tried to solve the root cause of the problem.
The launch vehicle for Intel's first 6Gbps SATA controller is unsurprisingly a high-end enterprise drive. Since the 2008 introduction of the X25-M, Intel has shifted towards prioritizing the enterprise market. All divisions of Intel have to be profitable and with high margins. The NAND Solutions Group (NSG) is no exception to the rule. With consumer SSDs in a race to the bottom in terms of pricing, Intel's NSG was forced to focus on an area that wouldn't cause mother Intel to pull the plug on its little experiment. The enterprise SSD market is willing to pay a premium for quality, and thus it became Intel's primary focus.
The first drive to use the new controller also carries a new naming system: the Intel SSD DC S3700. The DC stands for data center, which bluntly states the target market for this drive. While it's quite likely that we'll see a version appear in a high-end drive that could be used in a desktop, I don't know that we'll see a mobile version anytime soon for reasons I'll get to later.
The S3700 comes in four capacities (100, 200, 400 and 800GB) and two form factors (2.5" and 1.8"). The 1.8" version is only available at 200GB and 400GB capacities. Intel sees market potential for a 1.8" enterprise SSD thanks to the increasing popularity of blade and micro servers. The new controller supports 8 NAND channels, down from 10 in the previous design as Intel had difficulty hitting customer requested capacity points at the highest performance while populating all 10 channels.
The S3700 is a replacement to the Intel SSD 710, and thus uses Intel's 25nm MLC-HET (High Endurance Technology) NAND. The S3700 is rated for full 10 drive writes per day (4KB random writes) for 5 years.
|Intel SSD DC S3700 Endurance (4KB Random Writes, 100% LBA)|
|Rated Endurance||10DW x 5 years||10DW x 5 years||10DW x 5 years||10DW x 5 years|
|Endurance in PB||1.825 PB||3.65 PB||7.3 PB||14.6 PB|
That's the worst case endurance on the drive, if your workload isn't purely random you can expect even more writes out of the S3700. Compared to the SSD 710, the S3700 sees an increase in endurance even without allocating as much NAND as spare area (~32% vs. 60% on the 710). The increase in endurance even while decreasing spare area comes courtesy of the more mature 25nm MLC-HET process. It's process maturity that's also responsible for Intel not using 20nm NAND on the S3700. We'll eventually see 20nm MLC-HET NAND, but just not now.
Pricing is also much more reasonable than the Intel SSD 710. While the 710 debuted at around $6.30/GB, the Intel SSD DC S3700 is priced at $2.35/GB. It's still more expensive than a consumer drive, but the S3700 launches at the most affordable cost per GB of any Intel enterprise SSD. A non-HET version would likely be well into affordable territory for high-end desktop users.
|Intel SSD DC S3700 Pricing (MSRP)|
The third generation Intel controller supports 6Gbps SATA and full AES-256 encryption. The controller is paired with up to 1GB of ECC DRAM (more on this later). Intel does error correction on all memories (NAND, SRAM and DRAM) in the S3700.
Like previous enterprise drives, the S3700 features on-board capacitors to commit any data in flight on the drive to NAND in the event of a power failure. The S3700 supports operation on either 12V, 5V or both power rails - a first for Intel. Power consumption is rated at up to 6W under active load (peak power consumption can hit 8.2W), which is quite high and will keep the S3700 from being a good fit for a notebook.
Performance & IO Consistency
Performance is much greater than any previous generation Intel enterprise SATA SSD:
|Enterprise SSD Comparison|
|Intel SSD DC S3700||Intel SSD 710||Intel X25-E||Intel SSD 320|
|Capacities||100 / 200 / 400 / 800GB||100 / 200 / 300GB||32 / 64GB||80 / 120 / 160 / 300 / 600GB|
|NAND||25nm HET MLC||25nm HET MLC||50nm SLC||25nm MLC|
|Max Sequential Performance (Reads/Writes)||500 / 460 MBps||270 / 210 MBps||250 / 170 MBps||270 / 220 MBps|
|Max Random Performance (Reads/Writes)||76K / 36K||38.5K / 2.7K IOPS||35K / 3.3K IOPS||39.5K / 600 IOPS|
|Endurance (Max Data Written)||1.83 - 14.6PB||500TB - 1.5PB||1 - 2PB||5 - 60TB|
|Power Safe Write Cache||Y||Y||N||Y|
Intel is also promising performance consistency with its S3700. At steady state Intel claims the S3700 won't vary its IOPS by more than 10 - 15% for the life of the drive. Most capacities won't see more than a 10% variance in IO latency (or performance) at steady state. Intel has never offered this sort of a guarantee before because its drives would vary quite a bit in terms of IO latency. The chart below shows individual IO latency at steady state (displayed in IOPS to make the graph a bit easier to read) for Intel's SSD 710:
Note the insane distribution of IOs. This isn't just an Intel SSD issue, click the buttons above to look at how Samsung's SSD 840 Pro and the SandForce based 330 do. All of these drives show anywhere from a 2x - 10x gap between worst and best case random write performance over time. Lighter workloads won't look as bad, and having more spare area will help keep performance high, but Intel claims the S3700 is able to tighten its IO latency down to a narrow band of about 10 - 15% variance.
Intel also claims to be able to service 99.9% of all 4KB random IOs (QD1) in less than 500µs:
To understand how the S3700 achieves this controlled IO latency, we need to know a bit more about Intel's older controllers. In researching for this article, I managed to learn more about Intel's first SSD controller than I ever knew before.
Post Your CommentPlease log in or sign up to comment.
View All Comments
Kevin G - Monday, November 5, 2012 - linkThere is mention of a large capacitor to allow for writing the cache to NAND in the event of a power failure.
There are a couple of things Intel can do in this event to eliminate the possibility of cache corruption.
First is write though of any immediate change to the indirection tables. The problem of coherence between the cache and NAND would still exist but wouldn't require writing the entire cache to NAND. Making the DRAM cache write through would impact the write/erase cycles of the drive but I'm uncertain of the magnitude in comparison to heavy write IO.
The second option is that if the DRAM is used to create an optimized version of the directory tables for read only purposes, the old table in the NAND would still be valid (unless there needs to be change due to a write). Thus power loss would only lose the optimized table in DRAM but the unoptimized would still be functional in the NAND.
The third option involves optimized tables being written to disk while the unoptimized version is still in use in NAND. The last operation of writing the optimized indirection table to disk would be switching the status of what table is in active use. Thus only the optimized table is put into use after it has successfully been written to NAND. Sudden power failure in this process wouldn't impact the drive.
A fourth idea that comes to mind would be to make a reservation where the next optimized table would exist in NAND. Thus in the event of a sudden power failure, the SSD will use the unoptimized indirection tables but be able to see if anything has been written to the reserved space - it would know if it suffered a power loss and any recovery actions as necessary. This would eat space as the active table, a table being written and space for a future to be written would be 'in use'.
cdillon - Monday, November 5, 2012 - linkPersonally, I don't care if an SSD stores my user data (acknowledged writes, specifically) and/or internal metadata in a DRAM cache as long as it is battery and/or capacitor backed so that cache can be flushed to NAND after a power failure.
I think what I originally intended to say in my first comment was if Intel is not caching user data in DRAM, then what ARE they caching in DRAM that requires the super-capacitors to give them time to write it to NAND? If it isn't user data, then it must be the indirection tables or some other critical internal metadata. This internal metadata is at least as important as the user data itself, so why even make the distinction? The distinction stinks to me as either a marketing ploy or catering to some outdated PHB "requirement" that they need to meet in order to actually sell these drives to some enterprises. I'm not saying it's bad, just odd and probably non-optimal.
Kevin G - Monday, November 5, 2012 - linkIt is likely buffering the indirection table writes to reduce the number of NAND writes. Essentially it helps with the drives overall endurance. How much so would be dependent on just how frequently the indirection table is written to.
The other distinction is that they could be hitting a access time limitation by reading the indirection tables from NAND and then reading the data. By caching this in DRAM, the controller can lower access latencies to the NAND itself.
nexox - Monday, November 5, 2012 - linkNot storing user data in DRAM still helps - it forces the drive controller to actually operate efficiently instead of just fixing problems with more write cache. The indirection table doesn't change all that fast, so there won't be that much of it to flush out to NAND on power loss, but it's easy to built up a lot of user data in write cache, which requires that much more capacitance to get durably written.
And FYI, many SSDs will acknowledge a write when the data hits NAND durably, but will not guarantee that the corresponding indirection table entry is durably stored, so on power failure some blocks may appear to revert to their old state, from before the synced write took place.
Death666Angel - Tuesday, November 6, 2012 - link"Not storing user data in DRAM still helps - it forces the drive controller to actually operate efficiently instead of just fixing problems with more write cache."
And why should I care how the problem is fixed?
Efficient programming or throwing more hardware at the problem is the same thing for 99% of the usage cases. If maybe power consumption is a problem, then one solution might work better than another, but for the most part, a fix is a fix, at least in my book.
Kevin G - Tuesday, November 6, 2012 - linkHow the problem is fixed would matter to enterprise environments where reliability reigns supreme. How an issue is fixed in this area matters in the context of it happening again, just under different circumstances.
In this example, throwing more DRAM as a write cache for SSD's would be appropriate for consumers to address the issue but not necessarily the enterprise market. Keeping data in flash maintains data integrity which matters in scenarios of sudden power failure. The thing is that enterprise markets have a different usage scenario where the large write buffer that resolved the issue for consumers could still an issue at the enterprise level (ie the SSD would need an even larger DRAM buffer).
Bullwinkle J Moose - Monday, November 5, 2012 - linkDid I miss something?
With 1:1 mapping, this this sounds like the Worlds first truly O.S. agnostic controller
Does it require an O.S. with Trim or a partition offset for XP use, or did Intel just make the Worlds first universal SSD?
The 320 may have handled partition offsets internally but still required Trim for best performance
Please correct me if I'm wrong
jwilliams4200 - Tuesday, November 6, 2012 - linkYou're wrong. You have misunderstood how the indirection table works.
iwod - Monday, November 5, 2012 - linkThe only new, and truly innovation in this controller is the actually the software side of thing. 1:1 mapping and basically super fast storage table for updating, deleting by ECC RAM.
Couldn't 70 - 90% of this performance gain be implemented with other controller if they had large enough ECC DRAM?
Please correct me if I'm wrong
And what are the variation of Random I/O in other Enterprise Class SSD like Fusion IO?
MrSpadge - Tuesday, November 6, 2012 - linkTo me it sounds like this change requires an entirely different controller design, or at least a checking & rethinking of major parts. Intel surely didn't tell us everything that changed, just the most important result of the changes.