Success story: Improving Big Data storage and processing for geoengineering specialist
In this article, we present a successful installation of Open-E JovianDSS for a company that…Read More
According to the principle of entropy, everything that is ordered goes into disorder. The same happens to your data, once recorded and left with no guard, slowly decays – regardless of the medium it is stored. Just as ink on paper fades when exposed to the sun, your data on storage media such as HDDs and SSDs could also get corrupted. This is called data rot or data decay and, depending on the type of drive, can be caused by the loss of the magnetic state of bytes in HDD or by loss of the electrical charge in the SSD cells. So do not underestimate the risk of this phenomenon on your data storage.
Advanced file systems such as ZFS deal with this problem thanks to their self-healing ability. How does it work? All data saved in ZFS is saved in a redundant manner (as long as you do not disable this function, which we do not recommend). This means that along with the actual data, its checksum, and parity data are written in the storage pool as well. Each reading of this data involves checking against the checksum – if it does not match, it means the data has been corrupted. If it happens, the system goes to parity data, restores it, and returns the correct data to the application.
The ability of self-healing means that all the data we get from the storage is verified on an ongoing basis and, if necessary, repaired. So what can go wrong? The big part of your data is not read on a daily, or even weekly basis (but it doesn’t mean it is not important), and – as we already know – the probability of data corruption increases over time. This means that the less frequently we read data from the device, the chance it is corrupted increases. This leads to the dangerous situation when most frequently used data is safe (because it is subject to a frequent procedure of checking and repairing), but the ones used rarely (which does not mean that they are not important), not only are more likely to be damaged because of data rotting, but also go through the self-healing procedure less frequently.
Of course, there is also a solution for that. For ZFS based systems such as Open-E JovianDSS, we have the Scrub tool. As the name suggests, it scrubs all the data on the pool. What is the purpose of such a procedure? As previously mentioned, some of the data in our pool are rarely used, which carries the risk of data corruption. To prevent that, during the scrub process, the system reads all the data in the pool, checks it against their checksums, and if any corrupted data is spotted, it marks a drive block as unusable and copies data recovered from parity or mirror to a new place on the drive. It is therefore a drive hygiene task and it’s necessary to ensure data integrity.
Until quite recently, storage admins may have felt a certain reluctance to perform this process due to its duration. The older scrub algorithm was indeed inefficient and resource-hungry, which meant that the scrubbing of large volumes of data could take a very long time – sometimes even longer than a month. There could be several reasons for that:
Fortunately, this problem was solved by the implementation of a sequential data scrub algorithm, that scans metadata instead of data directly to create the in-memory list of data blocks. The blocks are then sorted by size and offset to arrange them in sequential order of their physical drive location. Now they can be checked against their checksums, to identify the corrupted data. It significantly improves this process and shortens it to several days (of course the effects depend on the type of data and the way they are distributed on the disk). So, no more excuses!
The sequential scrub algorithm has appeared in Open-E JovianDSS starting from the Up29 Version. Stay tuned and upgrade your storage!
How often should I scrub my disk? The precise answer to this question depends on:
Protip! The general rule is scrub should be carried out at least once a quarter – then we have ensured high integrity of all data, regardless of the frequency of its reads.
Leave a Reply