0 Liked

    How to Build the Fast Data Storage Server – Software and Hardware Caching Approaches

    In terms of performance, data storage administrators too often think they have to choose between speed and costs. Fortunately for us, there is a wide range of ways you can achieve decent data storage performance parameters, and don’t spend all the money. And the caching is the key.

    In this article I would like to show some options that you can take into consideration when selecting the best value for money data storage solution. There is no way to name all of them, since the final performance and the total cost depend on many factors, such as hardware (CPU, HBA’s, memory, NIC’s, quality cables and so on), and software that uses the hardware in an effective way.

    But in my opinion the key factor is a caching model, because it helps you to utilize even the medium level storage appliances to achieve the performance you will be satisfied with. It’s particularly important when we talk about the ZFS storage system, because it’s well known for its efficient caching mechanisms.

    So let’s discuss the different software and hardware caching approaches to prove the small changes can do a lot to improve your data storage.

    Caching Approaches

    To improve storage performance based on the cost effective drives you can deploy several types of caching approaches that utilize your valuable RAM memory in the most effective way. A cache uses the fast memory, located closer to a processor core to store copies of the frequently or recently used data (in the ZFS it’s a combination of both, as we will see in the next paragraph). It usually uses one cashing level, but when it comes to ZFS, there is also a second level of read cache, as well as a write log to protect data and align random write operations into the sequential portions. But let’s start from the single cache approach.

    ZFS Level 1 Cache (ARC)

    In ZFS, we can use the level 1 cache algorithm that is kept in RAM only, called Adaptive Replacement Cache (ARC). ARC stores the Most Frequently Used data (MFU) and the Most Recently Used data (MRU). Additionally ARC cache also uses so-called ghost lists for the MFU and MRU data just in case the same data (MFU) will be used over and over, or there will be only new data processed (MRU). This way ARC is able to adapt to the load and provides more space for MRU or MFU depending on the current needs.

    The first one (ghost MFU) works in a way that the data removed from the MFU is cached back to save the space in the MFU. The index pointers constantly track the location of the data removed from RAM. In turn, the ghost MRU stores the commonly utilized data in the same way.

    The ZFS additionally extends the Adaptive Replacement Cache (ARC) and adds the following improvements:

    • The ZFS ARC will occupy most of the available RAM memory by default. Therefore, the total cache will adjust the size according to the kernel decisions. Thus, if there is a need to utilize more RAM by the kernel, the ZFS ARC will be accommodated to suit the kernel needs. Note that the ZFS ARC is able to occupy all the space that will be available for it.
    • Various block sizes can be used to work with ZFS ARC.

    Level 2 Adaptive Replacement Cache (L2ARC)

    L2ARC is an extension of the ARC, and is used in cases when data is about to be removed from ARC. An algorithm moves the MRU and MFU records into a buffer that is subsequently written to the L2ARC drive.

    L2ARC is usually located on a drive that is not that fast as RAM, but is still well-performing (i.e. SSD), so the data can be easily retrieved. This way, you save the RAM capacity, but still have access to the cached data.

    ZIL and SLOG (Write Log)

    Additionally, in ZFS-based storage systems, there is a ZIL logging mechanism available, as well as a possibility to use a SLOG device to optimize the caching process. ZIL’s primary aim is to write data from RAM in the short-term but quick way (to protect data in case of power shortage) – and then data is written across the pool for long-term storage (ZIL also sets data sequentially in order to minimise the random writes on a final drive). The ZIL data should be placed on a separate, fast SLOG device to speed up the process. If there is no SLOG, the ZIL will be created on the main storage device, where the final data location is.

    Hardware Solutions

    Not only the software caching approaches can accelerate the speed of your system. You can also use some hardware solutions that make your cache more efficient. There are plenty of options on the market, and the main difference is about the technology and the cost. You can find below the three examples with their pros and cons to consider.

    Fast ECC RAM Memory

    As I already said, there is a possibility to buy ECC RAM for your data storage solution that offers the best-in-class performance results. Note however the RAM memory is not always cost efficient, and in some cases is just a waste of money to buy more and more – since the other hardware options are cheaper and easier to scale.

    NVMe Drives

    The Non-Volatile Memory express (NVMe) SSD drives are much faster than both the SAS and SATA drives and are capable of having the read speed at about 7000 Mbps whereas one of the fastests SATA SSDs offer the read and write speed at about 1000 Mbps.

    Intel® Optane™

    Among the NVMe drives, Intel® Optane™ technology should be taken into account. The big advantage of this technology is the fact that the drive performance is stable over time, and doesn’t decrease as it is in the case of SSDs. The Intel® Optane™ drives are cheaper than ECC RAM, even if not as fast as them, they are still really fast.

    In comparison to the regular SSD drives, Intel® Optane™ disk offers significantly lower and stable latency, and extended lifetime (up to 60 DWPD over 5 years). What is more, Intel® Optane™ used as a write log protects the regular SSD drives and extends its lifetime thanks to more sequential and less frequent writes.

    SATA SSD Drives

    If you are looking for a good caching solution but you have a fixed budget then SATA SSD drives may be a good choice for you. While not as fast as NVMe drives, they provide good read and write speeds which makes them a good choice for data storage architecture. These drives are cheaper than all the above mentioned solutions but are also slower, and their performance decreases over time, but they still offer good value for money, especially if you don’t require a very fast system.

    What to Choose?

    By using advanced ZFS-caching mechanisms such as ARC, L2ARC or write log, it is a flexible and well-balanced solution between costs and performance. In addition, there is also an option to utilize some hardware solutions, such as NVMe drives, that are still not that expensive as the ECC RAM memory, or significantly slower but cheaper SSD drives.

    Taking everything into account, you can decide now which solution would be the right for the needs of your storage architecture and whether the given selection would be right in terms of performance and money required to power such a data storage solution.

    Rating: / 5.

    No votes yet

    Leave a Reply