The first version of the Non-Volatile Memory Express (NVMe) standard was ratified almost five years ago, but its development didn't stop there. While SSD controller manufacturers have been hard at work implementing NVMe in more and more products, the protocol itself has acquired new features. Most of them are optional and most are intended for enterprise scenarios like virtualization and multi-path I/O, but one feature introduced in the NVMe 1.2 revision has been picked up by a controller that will likely see use in the consumer space.

The Host Memory Buffer (HMB) feature in NVMe 1.2 allows a drive to request exclusive access to a portion of the host system's RAM for the drive's private use. This kind of capability has been around forever in the GPU space under names like HyperMemory and TurboCache, where it served a similar purpose: to reduce or eliminate the dedicated RAM that needs to be included on peripheral devices.

Modern high-performance SSD controllers use a significant amount of RAM, and typically we see a ratio of 1GB of RAM for every 1TB of flash. The controllers are usually conservative about using that RAM as a cache for user data (to limit the damage of a sudden power loss) and instead it is used to store the organizational metadata necessary for the controller to keep track of what data is stored where on the flash chips. The goal is that when the drive recieves a read or write request, it can determine which flash memory location needs to be accessed based on a much quicker lookup in the controller's DRAM, and the drive doesn't need to update the metadata copy stored on the flash after every single write operation is completed. For fast consistent performance, the data structures are chosen to minimize the amount of computation and number of RAM lookups required at the expense of requiring more RAM.

At the low end of the SSD market, recent controller configurations have chosen instead to cut costs by not including any external DRAM. There are combined savings of die size and pin count for the controller in this configuration, as well as reduced PCB complexity for the drive and eliminating the DRAM chip from the bill of materials, which can add up to a competitive advantage in the product segments where performance is a secondary concern and every cent counts. Silicon Motion's DRAM-less SM2246XT controller has stolen some market share from their own already cheap SM2246EN, and in the TLC space almost everybody is moving toward DRAM-less options.

The downside is that without ample RAM, it is much harder for SSDs to offer high performance. Even with clever firmware, DRAM-less SSDs can cope surprisingly well with just the on-chip buffers, but they are still at a disadvantage. That's where the Host Memory Buffer feature comes in. With only two NAND channels on the 88NV1140, it probably can't saturate the PCIe 3.0 x1 link under even the best circumstances, so there will be bandwidth to spare for other transfers with the host system. PCIe transactions and host DRAM accesses are measured in tens or hundreds of nanoseconds compared to tens of microseconds for reading from flash, so it's clear that a Host Memory Buffer can be fast enough to be useful for a low-end drive.

The trick then is to figure out how to get the most out of a Host Memory Buffer, while remaining prepared to operate in DRAM-less mode if the host's NVMe driver doesn't support HMB or if the host decides it can't spare the RAM. SSD suppliers are universally tight-lipped about the algorithms used in their firmware and Marvell controllers are usually paired with custom or third-party licensed firmware anyways, so we can only speculate about how a HMB will be used with this new 88NV1140 controller. Furthermore, the requirement of driver support on the host side means this feature will likely be used in embedded platforms long before it finds its way into retail SSDs, and this particular Marvell controller may never show up in a standalone drive. But in a few years time it might be standard for low-end SSDs to borrow a bit of your system's RAM. This becomes less of a concern as we move through successive platforms having access to more DRAM per module in a standard system.

Source: Marvell

Comments Locked


View All Comments

  • The_Assimilator - Tuesday, January 12, 2016 - link

    The only way this will help DRAM-less SSDs is if they're using system RAM for storing their page tables, and that's already a bad idea.
  • extide - Tuesday, January 12, 2016 - link

    Yeah that's exactly the point, storing the page table data in system ram. You could do this in ways where it would be a good idea, not a bad one.
  • bug77 - Tuesday, January 12, 2016 - link

    I'm not sure I want to find out my data was still in RAM when the power went out.
  • extide - Tuesday, January 12, 2016 - link

    It's not a data cache, but a metadata storage location.
  • name99 - Tuesday, January 12, 2016 - link

    Obviously I don't know what they are doing exactly, but this sort of thing is not completely unprecedented.
    For example fairly recently (in 10.10 or 10.11) Apple have changed the tree data structure they use to describe the contents of JHFS+ volumes. In an ideal world, this better data structure would also be stored on the volume, but that would be a non-backwards-compatible change; so instead they construct the tree at mount-time and use it in RAM, but it has no persistence. This makes mounting a little slower but what can you do.

    So in principle the SSD could do the same thing --- use compressed state within flash to describe data layout along with a faster in-RAM version of the same data. The issue then is simply to ensure that any PERSISTENT change to the RAM version of the data structure is pushed out to flash in a timely and safe manner. That's not trivial, of course, but it's the standard file system problem (and in principle easier for the SSD because it has more control over the exact ordering of writes than a file system does).
    Time will show whether Marvell solved it with the robustness of a modern file system or with the "robustness" of FAT.
  • zodiacfml - Tuesday, January 12, 2016 - link

    Nice. But, I'm not sure this will be utilized in cheap devices which needs this while the more expensive drives will boast more RAM as a marketing tool.
  • Visual - Tuesday, January 12, 2016 - link

    Can this even happen in OS-agnostic ways or will it need drivers for each OS to tell it not to touch that part of RAM etc?

    And sure, move the controller's working memory to my system RAM. What's next... move the controller's logic itself to my system RAM and have it ran by my system CPU instead of making a controller at all? Then call it "revolutionary new progress".
  • extide - Tuesday, January 12, 2016 - link

    Yes, it needs the NVME driver at least supporting rev 1.2
  • DanNeely - Tuesday, January 12, 2016 - link

    It'll need driver support; but that will consist of little more than doing a memory allocation to hold it.

    Moving the controller logic to the CPU won't work on anything this side of a real time OS; CPU scheduling's way too unpredictable, and the size of the controller is probably limited by the number of bumps it needs for IO pins to talk to the PCIe bus and flash chips anyway so it wouldn't help. The reason why devices can offload memory to the main system without major latency penalties is that for the last 20+ years, the cpu/memory controller/etc platforms have all supported direct memory access; which lets devices on an IO bus talk to the memory controller directly without having to raise interupts and wait until the CPU gets around to handling them some time later.
  • Visual - Wednesday, January 13, 2016 - link

    Uh, what? Of course moving the controller logic to the CPU can work, CPU scheduling is a non-issue when your code is a ring-0 driver. Well, a simpler "controller" will still remain in hardware, but it will not have to deal with any block remapping, wear levelling, caching and whatever else. Just give raw 1:1 access to all the blocks. This is nothing new, it could have been done in the very first SSDs, and just like back then it was deemed a bad idea, I think it is still a bad idea today, and that was my entire point. Depending on OS-specific drivers where so many things can go wrong so easily is not worth the small cost savings.

Log in

Don't have an account? Sign up now