Keep Your Storage Secure and Scrub Data Regularly

Keep Your Storage Secure and Scrub Data Regularly

February 04, 2021
No Comments

According to the principle of entropy, everything that is ordered goes into disorder. The same happens to your data, once recorded and left with no guard, slowly decays – regardless of the medium it is stored. Just as ink on paper fades when exposed to the sun, your data on storage media such as HDDs and SSDs could also get corrupted. This is called data rot or data decay and, depending on the type of drive, can be caused by the loss of the magnetic state of bytes in HDD or by loss of the electrical charge in the SSD cells. So do not underestimate the risk of this phenomenon on your data storage.

ZFS data self-healing

Advanced file systems such as ZFS deal with this problem thanks to their self-healing ability. How does it work? All data saved in ZFS is saved in a redundant manner (as long as you do not disable this function, which we do not recommend). This means that along with the actual data, its checksum, and parity data are written in the storage pool as well. Each reading of this data involves checking against the checksum – if it does not match, it means the data has been corrupted. If it happens, the system goes to parity data, restores it, and returns the correct data to the application.

The ability of self-healing means that all the data we get from the storage is verified on an ongoing basis and, if necessary, repaired. So what can go wrong? The big part of your data is not read on a daily, or even weekly basis (but it doesn’t mean it is not important), and – as we already know – the probability of data corruption increases over time. This means that the less frequently we read data from the device, the chance it is corrupted increases. This leads to the dangerous situation when most frequently used data is safe (because it is subject to a frequent procedure of checking and repairing), but the ones used rarely (which does not mean that they are not important), not only are more likely to be damaged because of data rotting, but also go through the self-healing procedure less frequently.

Data scrubbing tool

Of course, there is also a solution for that. For ZFS based systems such as Open-E JovianDSS, we have the Scrub tool. As the name suggests, it scrubs all the data on the pool. What is the purpose of such a procedure? As previously mentioned, some of the data in our pool are rarely used, which carries the risk of data corruption. To prevent that, during the scrub process, the system reads all the data in the pool, checks it against their checksums, and if any corrupted data is spotted, it marks a drive block as unusable and copies data recovered from parity or mirror to a new place on the drive. It is therefore a drive hygiene task and it’s necessary to ensure data integrity.

Until quite recently, storage admins may have felt a certain reluctance to perform this process due to its duration. The older scrub algorithm was indeed inefficient and resource-hungry, which meant that the scrubbing of large volumes of data could take a very long time – sometimes even longer than a month. There could be several reasons for that:

a huge amount of small data blocks;
the system is very busy because of the nature of the stored data (scrub is a low-priority task, so it has to wait its turn in such a case);
the imperfection of the algorithm itself, which was based on the random IOs issues.

Fortunately, this problem was solved by the implementation of a sequential data scrub algorithm, that scans metadata instead of data directly to create the in-memory list of data blocks. The blocks are then sorted by size and offset to arrange them in sequential order of their physical drive location. Now they can be checked against their checksums, to identify the corrupted data. It significantly improves this process and shortens it to several days (of course the effects depend on the type of data and the way they are distributed on the disk). So, no more excuses!

The sequential scrub algorithm has appeared in Open-E JovianDSS starting from the Up29 Version. Stay tuned and upgrade your storage!

Data scrubbing frequency

How often should I scrub my disk? The precise answer to this question depends on:

the given storage data
the quality of our drive
frequency of hardware failures

Protip! The general rule is scrub should be carried out at least once a quarter – then we have ensured high integrity of all data, regardless of the frequency of its reads.

data HDD IO JovianDSS pool pooled storage scrub SSD ZFS

Janusz Bak

Chief Technology Officer

Janusz Bak joined Open-E in 1999 and has been serving as Open-E's CTO ever since. Janusz has over 30 years of software engineering experience and is a recognized expert on storage technologies. Before Open-E, Janusz headed up German support operations at Aztech Systems and at Mega.

Leave a Comment

Featured Posts

Optimizing Data Storage Costs & Efficiency with Open-E JovianDSS

In today’s data-driven world, the importance of optimizing data storage cannot be overstated. As data continues to grow at an unprecedented rate, businesses face significant challenges in managing, storing, and ...

Data Storage Monitoring in Open-E JovianDSS with Checkmk and Diagnostic Tools

Among the characteristics of an optimal data storage solution, several features should stand out. It should provide full checksumming, self-repair, and backup and restore capabilities with short RPOs and RTOs. ...

How To Improve Your Business With ZFS

The smooth workflow of almost any business today is mainly based on data management. Media, transportation and logistics, finance, the public, government, or medical sectors – basically, you can list ...

Welcome to Open-Experts — The Data Storage Podcast!

Our charismatic host, Todd Maxwell, with almost 20 years of experience in the data storage market, delves into the world of data storage solutions. Learn about key trends, technologies, and ...

Want to Learn More?

Open-E Data Storage Calculator page

3-in-1 Complete Data Storage Solution

Accelerate Your Data Storage with ZFS-based Storage System

Start 60 Day FREE TRIAL

Open-E data storage calculator tabs

Find the Exact License for Your Storage Setup

This calculator helps you to find the exact license required for your storage setup with Open-E JovianDSS, based on your individual specification.

Enter the configuration of your choice into the calculator and generate a PDF report.

Try the Calculator

Open-E Library

Manuals and Quick Starts

How-to Resources

Video Tutorials

Courses