Intel SSD 750 PCIe SSD Review: NVMe for the Client
by Kristian Vättö on April 2, 2015 12:00 PM ESTAnandTech Storage Bench - The Destroyer
The Destroyer has been an essential part of our SSD test suite for nearly two years now. It was crafted to provide a benchmark for very IO intensive workloads, which is where you most often notice the difference between drives. It's not necessarily the most relevant test to an average user, but for anyone with a heavier IO workload The Destroyer should do a good job at characterizing performance.
AnandTech Storage Bench - The Destroyer | ||||||||||||
Workload | Description | Applications Used | ||||||||||
Photo Sync/Editing | Import images, edit, export | Adobe Photoshop CS6, Adobe Lightroom 4, Dropbox | ||||||||||
Gaming | Download/install games, play games | Steam, Deus Ex, Skyrim, Starcraft 2, BioShock Infinite | ||||||||||
Virtualization | Run/manage VM, use general apps inside VM | VirtualBox | ||||||||||
General Productivity | Browse the web, manage local email, copy files, encrypt/decrypt files, backup system, download content, virus/malware scan | Chrome, IE10, Outlook, Windows 8, AxCrypt, uTorrent, AdAware | ||||||||||
Video Playback | Copy and watch movies | Windows 8 | ||||||||||
Application Development | Compile projects, check out code, download code samples | Visual Studio 2012 |
The table above describes the workloads of The Destroyer in a bit more detail. Most of the workloads are run independently in the trace, but obviously there are various operations (such as backups) in the background.
AnandTech Storage Bench - The Destroyer - Specs | ||||||||||||
Reads | 38.83 million | |||||||||||
Writes | 10.98 million | |||||||||||
Total IO Operations | 49.8 million | |||||||||||
Total GB Read | 1583.02 GB | |||||||||||
Total GB Written | 875.62 GB | |||||||||||
Average Queue Depth | ~5.5 | |||||||||||
Focus | Worst case multitasking, IO consistency |
The name Destroyer comes from the sheer fact that the trace contains nearly 50 million IO operations. That's enough IO operations to effectively put the drive into steady-state and give an idea of the performance in worst case multitasking scenarios. About 67% of the IOs are sequential in nature with the rest ranging from pseudo-random to fully random.
AnandTech Storage Bench - The Destroyer - IO Breakdown | |||||||||||
IO Size | <4KB | 4KB | 8KB | 16KB | 32KB | 64KB | 128KB | ||||
% of Total | 6.0% | 26.2% | 3.1% | 2.4% | 1.7% | 38.4% | 18.0% |
I've included a breakdown of the IOs in the table above, which accounts for 95.8% of total IOs in the trace. The leftover IO sizes are relatively rare in between sizes that don't have a significant (>1%) share on their own. Over a half of the transfers are large IOs with one fourth being 4KB in size.
AnandTech Storage Bench - The Destroyer - QD Breakdown | ||||||||||||
Queue Depth | 1 | 2 | 3 | 4-5 | 6-10 | 11-20 | 21-32 | >32 | ||||
% of Total | 50.0% | 21.9% | 4.1% | 5.7% | 8.8% | 6.0% | 2.1% | 1.4% |
Despite the average queue depth of 5.5, a half of the IOs happen at queue depth of one and scenarios where the queue depths is higher than 10 are rather infrequent.
The two key metrics I'm reporting haven't changed and I'll continue to report both data rate and latency because the two have slightly different focuses. Data rate measures the speed of the data transfer, so it emphasizes large IOs that simply account for a much larger share when looking at the total amount of data. Latency, on the other hand, ignores the IO size, so all IOs are given the same weight in the calculation. Both metrics are useful, although in terms of system responsiveness I think the latency is more critical. As a result, I'm also reporting two new stats that provide us a very good insight to high latency IOs by reporting the share of >10ms and >100ms IOs as a percentage of the total.
In terms of throughput, the SSD 750 is actually marginally slower than the SM951, although when you look at latency the SD 750 wins by a large margin. The difference in these scores is explained by Intel's focus on random performance as Intel specifically optimized the firmware for high random IO performance, which does have some impact on the sequential performance. As I've explained above, data rate has more emphasis on large IO size transfers, whereas latency treats all IOs the same regardless of their size.
The number of high latency IOs is also excellent and in fact the best we have tested. The SSD 750 is without a doubt a very consistent drive.
132 Comments
View All Comments
Kristian Vättö - Friday, April 3, 2015 - link
As I explained in the article, I see no point in testing such high queue depths in a client-oriented review because the portion of such IOs is marginal. We are talking about a fraction of a percent, so while it would show big numbers it has no relevance to the end-user.voicequal - Saturday, April 4, 2015 - link
Since you feel strongly enough to levy a personal attack, could you also explain why you think QD128 is important? Anandtech's storage benchmarks are likely a much better indication of user experience unless you have a very specific workload in mind.d2mw - Friday, April 3, 2015 - link
Guys why are you cutpasting the same old specs table and formulaic article? For a review of the first consumer NVMe I'm sorely disappointed you didn't touch on latency metrics: one of the most important improvements with the NVMe busKristian Vättö - Friday, April 3, 2015 - link
There are several latency graphs in the article and I also suggest that you read the following article to better understand what latency and other storage metrics actually mean (hint: latency isn't really different from IOPS and throughput).http://www.anandtech.com/show/8319/samsung-ssd-845...
Per Hansson - Friday, April 3, 2015 - link
Hi Kristian, what evidence do you have that the firmware in the SSD 750 is any different from that found in the DC P3600 / P3700?According to leaked reports released before they have the same firmware: http://www.tweaktown.com/news/43331/new-consumer-i...
And if you read the Intel changelog you see in firmware 8DV10130: "Drive sub-4KB sequential write performance may be below 1MB/sec"
http://downloadmirror.intel.com/23931/eng/Intel_SS...
Which was exactly what you found in the original review of the P3700:
http://www.anandtech.com/show/8147/the-intel-ssd-d...
http://www.anandtech.com/bench/product/1239
Care to retest with the new firmware?
I suspect you will get identical performance.
Per Hansson - Saturday, April 4, 2015 - link
I should be more clear: I mean that you retest the P3700.And obviously the performance of the 750 wont match that, as it is based of the P3500.
But I think you get what I mean anyway ;)
djsvetljo - Friday, April 3, 2015 - link
I am unclear of which connector will this use. Does it use the video card PCI-E port?I have MSI Z97 MATE board that has one PCI-E gen3 x16 and one PCI-E gen2 x 4. Will I be able to use it and will I be limited somehow?
DanNeely - Friday, April 3, 2015 - link
if you use the 2.0 x4 slot your maximum throughput will top out at 2gb/sec. For client workloads this probably won't matter much since only some server workloads can hit situations where the drive can exceed that rate.djsvetljo - Friday, April 3, 2015 - link
So it uses the GPU express port although the card pins are visually shorter ?eSyr - Friday, April 3, 2015 - link
> although in real world the maximum bandwidth is about 3.2GB/s due to PCIe inefficiencyWhat does this phrase mean? If you're referring to 8b10b encoding, this is plainly false, since PCIe gen 3 utilized 128b130b coding. If you're referring to the overheds related to TLP and DLLP headers, this is depends on device's and PCIe RC's maximum transaction size. But, even with (minimal) 128 byte limit it would be 3.36 GB/s. In fact, modern PCIe RCs support much larger TLPs, thus eliminating header-related overheads.