After RAID 0 and RAID 1 (with RAID 1+0 and RAID 0+1) it is time for RAID 2, 3 and 4. Here we present a short description of the mentioned levels. We hope it will outline a picture of the functionality of these solutions. Although the article is kind of a history lesson – these solutions are no longer used – it is good to be aware of the origins of modern storage technologies.
RAID 2 – the bit-level striping with dedicated Hamming-code parity
In the case of RAID 2, all the data is striped (to the bit levels – not block). Each bit is written on a different drive/stripe. Such a solution requires the use of Hamming code for error correction.
Hamming code is a linear error-correcting code-named after its inventor, Richard Hamming. Hamming codes can detect up to d – 1-bit errors, and correct (d – 1) / 2-bit errors, where d is the minimum hamming distance between all pairs in the code words. Thus, reliable communication is possible when the Hamming distance between the transmitted and received bit patterns is less than or equal to d. By contrast, the simple parity code cannot correct errors and detects only an odd number of errors.
The number of disks in RAID 2 used to store information is equal to the logarithm of the number of discs that are protecting the mentioned data. All disks in RAID 2 work as one disk with a capacity equal to the common capacity of all disks used to store data.
While RAID 2 is being used, it is essential to synchronize all disks. Such a solution requires that the controller makes the disks spin at the same angular orientation – if they spin in any other way, the index will not be reached at the same time. Desynchronization will lead to the total uselessness of drives in the array.
Such a requirement is not the only drawback. The need for long Hamming code generation may also prove to be problematic by slowing the whole system down.
The way RAID 2 works may be hard to understand. The need to use Hamming code and special controllers for disks contributes to making RAID 2 not a very popular solution. But if we think about it less pragmatically, it may prove to be very interesting – mainly due to its modus operandi. It introduces many more complex solutions than RAID 0 and RAID 1. When everything works well, RAID 2 proves to be quite a good solution in the area of data security. In case of HDD failure – no matter if it was the disk with data or the Hamming code – any part of the array may be reconstructed by the other disks used.
While it is exciting and it has its advantages, we have not heard about any commercial implementations of RAID 2. Solutions based on it were used only in the initial phase of RAID systems usage – before disks were equipped with their own correction code. Modern HDDs use various correction and optimizing algorithms. That is why the Hamming system has started to be less attractive in the area of professional usage, and it is no longer implemented in modern controllers.
RAID 3 – another rare one in practice
RAID 3 works as RAID 0 does – although it explicitly uses byte-level striping, it also uses an additional disk in the array. It is used to store checksums, and it supports a special processor in parity codes calculating – so we may call it “the parity disk”.
In RAID 3, configuration data are divided into individual bytes and then saved on a disk. Parity byte is determined for each row of data and saved on the mentioned “parity disk”. In case of failure, it allows recovering data by appropriately calculating the remaining bytes and parity bytes that correspond with them.
Although RAID 3 is rarely used in practice, it is worth pointing out its advantages. First of all, is its resistance to damage of one disk in the arrangement. Secondly, high read speed. Unfortunately, it also has a couple of drawbacks.
The read speed is more than satisfactory but write speed is on the contrary – the reason being the necessity of checksums calculating (even RAID hardware controllers cannot solve this problem). The second disadvantage is a matter of disk failure. When it happens, the whole system will work much slower. Although RAID 3 is resistant to breakdown (in case of failure of one disk in the array), replacing a damaged disk is very costly. A third problem is a disk used for calculating checksums – it is usually the bottleneck in the performance of the entire array.
As can be easily seen, RAID 3 is not a good, reliable, or cheap solution. Therefore, as it was mentioned earlier, its use is rare in practice. Systems based on RAID 3 are mainly purposed for implementations where a small number of users refer to the very large files.
RAID 4 – works similarly to RAID 3 and 5
RAID 4 is very similar to RAID 3. The main difference is the way of sharing data. They are divided into blocks (16, 32, 64, or 128 kB) and written on disks – similar to RAID 0. For each row of written data, any recorded block is written on a parity disk. In short, this means that RAID 4 does not stripe data at the block level, but it uses byte levels for striping (block-level striping with a dedicated parity disk).
There are also similarities in relation to RAID 5, but it confines all parity data to a single drive. RAID 4 does not use distributed parity.
RAID 4 requires at least three disks for complete implementation and configuration. What is more, it also needs hardware support for parity calculations. This makes it possible to recover data by the appropriate mathematical operations.
If we asked: what is RAID 4 for? We would point out one particular need. Such a solution will work very well in the case of huge files – when a sequential read and write data process is used. Using RAID 4 for small portions of data would not be a good idea. The reason is the need to carry out modifications of parity blocks for each I/O session. The need for the continuous repeating of such an operation would cause large losses of time and slow down a whole system.