Introduction: Enterprise Storage 101

Since the introduction of x86 based servers at the end of the 20th century, the cost of server hardware has declined rapidly while the performance per watt and performance per dollar has increased rapidly. This pushed the server market to evolve from closed, proprietary, and most importantly extremely expensive mainframe and proprietary RISC servers into today's highly competitive x86 server market. However, the professional storage market is still ruled by the proprietary, legacy systems.

Today, you can get a very powerful server that can cope with most server workloads for something like $5000. Even better, you can run tens of workloads in parallel by virtualizing them. But go to the storage market with four or even five times the budget and you will likely return with a low end SAN.

Worse yet is that there is good chance this expensive device will choke regularly due to the use of a storage intensive application. We quote a market survey conducted in march 2013:

Forty-four percent of respondents said disproportionate storage-related costs were an obstacle preventing them from virtualizing more of their workloads. Forty-two percent said the same about performance degradation or the inability to meet performance expectations.

Note that the study does not mention the percentage of customers stuck in denial :-). The performance per dollar of the average SAN array is mediocre at best, and the storage capacity per dollar is simply awful.

You might think that the hardware inside a SAN is vastly superior to what can be found in your average server, but that is not the case. EMC (the market leader) and others have disclosed more than once that “the goal has always to been to use as much standard, commercial, off-the-shelf hardware as we can”. So your SAN array is probably nothing more than a typical Xeon server built by Quanta with a shiny bezel. A decent professional 1TB drive costs a few hundred dollars. Place that same drive inside a SAN appliance and suddenly the price per terabyte is multiplied by at least three, sometimes even 10! When it comes to pricing and vendor lock-in you can say that storage systems are still stuck in the “mainframe era” despite the use of cheap off-the-shelf hardware.

So why do EMC, NetApp, and the other giants in the storage market charge so much for what is essentially a Xeon based server, an admittedly well designed and reliable storage backplane, and some unreliable and slow performing hard drives? The reason is not some complex market situation that can only be explained by financial experts. No, the key is simply the basic component of a storage system: the unreliable and slow magnetic disk.

As we all know, the magnetic disk is the component that fails most in the data center, and it is by far the slowest core component in modern computers. Building a reliable and somewhat performant storage system based upon such a mediocre storage component can only be done with complex software. And complex software is yet another prime reason why IT services fail. So only a few companies that were able to build a solid reputation gained enough trust to succeed in the storage market.

This is why EMC, NetApp, IBM, and HP rule the storage market today even though they are charging an arm and a leg for a few terabytes of capacity. Professional buyers trust the devices these vendors make and are willing to pay a huge premium just to be sure that they get good reliability and decent but hardly compelling performance and capacity.

The Winds of Change
Comments Locked


View All Comments

  • jhh - Wednesday, August 7, 2013 - link

    And Advanced/SDDC/Chipkill ECC, not the old-fashioned single-bit correct/multiple bit detect. The RAM on the disk controller might be small enough for this not to matter, but not on the system RAM.
  • tuxRoller - Monday, August 5, 2013 - link

    Amplidata's dss seems like a better, more forward looking alternative.
  • Sertis - Monday, August 5, 2013 - link

    The Amplistore design seems a bit better than ZFS. ZFS has a hash to detect bit rot within the blocks, while this stores FEC coding that can potentially recover the data within that block without calculating it based on parity from the other drives on the stripe and the I/O that involves. It also seems to be a bit smarter on how it distributes data by allowing you to cross between storage devices to provide recovery at the node level while ZFS is really just limited to the current pool. It has various out of band data rebalancing which isn't really present in ZFS. For example, add a second vdev to a zpool when it's 90% full and there really isn't a process to automatically rebalance the data across the two pools as you add more data. The original data stays on that first vdev, and new data basically sits in the second vdev. It seems very interesting, but I certainly can't afford it, I'll stick with raidz2 for my puny little server until something open source comes out with a similar feature set.
  • Seemone - Tuesday, August 6, 2013 - link

    Are you aware that with ZFS you can specify the number of replicas each data block should have on a per-filesystem basis? ZFS is indeed not very flexible on pool layout and does not rebalance things (as of now), but there's nothing in the on-disk data structure that prevent this. This means it can be implemented and would be applicable on old pools in a non disruptive way. ZFS, also, is open source, its license is simply not compatible with GPLv2, hence ZFS-On-Linux separate distribution.
  • Brutalizer - Tuesday, August 6, 2013 - link

    If you want to rebalance ZFS, you just copy the data back and forth and rebalancing is done. Assume you have data on some ZFS disks in a ZFS raid, and then you add new empty discs, so all data will sit on the old disks. To spread the data evenly to all disks, you need to rebalance the data. Two ways:
    1) Move all data to another server. And then move back the data to your ZFS raid. Now all data are rebalanced. This requires another server, which is a pain. Instead, do like this:
    2) Create a new ZFS filesystem on your raid. This filesystem is spread out on all disks. Move the data to the new ZFS filesystem. Done.
  • Sertis - Thursday, August 8, 2013 - link

    I'm definitely looking forward to these improvements, if they eventually arrive. I'm aware of the multiple copy solution, but if you read the Intel and Amplistore whitepapers, you will see they have very good arguments that their model works better than creating additional copies by spreading out FEC blocks across nodes. I have used ZFS for years, and while you can work around the issues, it's very clear that it's no longer evolving at the same rate since Oracle took over Sun. Products like this keep things interesting.
  • Brutalizer - Tuesday, August 6, 2013 - link

    Theory is one thing, real life another. There are many bold claims and wonderful theoretical constructs from companies, but do they hold up to scrutiny? Researchers injected artificially constructed errors in different filesystems (NTFS, Ext3, etc), and only ZFS detected all errors. Researchers have verified that ZFS seems to combat data corruption. Are there any research on Amplistore's ability to combat datacorruption? Or do they only have bold claims? Until I see research from a third part, independent part, I will continue with the free open source ZFS. CERN is now switching to ZFS for tier-1 and tier-2 long time term storage, because vast amounts of data _will_ have data corruption, CERN says. Here are research papers on data corruption on NTFS, hardware raid, ZFS, NetApp, CERN, etc:
    For instance, Tegile, Coraid, GreenByte, etc - are all storage vendors that offers PetaByte Enterprise servers using ZFS.
  • JohanAnandtech - Tuesday, August 6, 2013 - link

    Thanks, very helpful feedback. I will check the paper out
  • mikato - Thursday, August 8, 2013 - link

    And Isilon OneFS? Care to review one? :)
  • bitpushr - Friday, August 9, 2013 - link

    That's because ZFS has had a minimal impact on the professional storage market.

Log in

Don't have an account? Sign up now