We are sure some of you have thought about DRBD – about its use and operation. Here is some information about it – a few benefits, some drawbacks, tips and solutions.
The built-in volume replication mechanism is based on the open source DRBD which was being developed by the Austrian company LINBIT. It is a popular and widely used software for High Availability (HA) Clusters. Volume replication is used to create a mirror copy of data and put it on another server – in case of main server failure. This is a way to protect data and avoid any periods of unavailability.
Basically, DRBD is used to create a mirror of a whole block device via an assigned network. In DSS V6, the LVM logical volume is a block device and the implementation of DRBD protocol is used for synchronous mirroring. It means that the file system on the primary node is notified that the write process of the block was finished only when the block was successfully recorded on the primary and secondary node. It is crucial for HA Clusters where every piece of information is precious and cannot be lost. As always there are pros and cons of such a solution.
The main disadvantage is the fact that write-speed will depend on connection speed between the primary and secondary node. Moreover, if really fast hardware (H/W RAID, CPU, memory, NICs) is used on the primary node and the secondary node consists of low-end hardware, it will have a significant effect on the overall write-speed performance (write process on logical volumes with volume replication enabled). The reason for this is that although the primary node is able to write the data really fast on the local H/W RAID, it still has to wait for the confirmation from the secondary node that the data was written successfully. Only then can the primary node confirm to the host that the data was stored and it is ready to receive and store other data.
It’s easy to see that there are several elements which can be a possible bottleneck for the volume replication process and can decrease the overall performance. That’s why it is good to spend more time fully optimizing the I/O write performance on both of your systems before volume replication is enabled. I/O performance depends on multiple factors and will vary mostly depending on your hardware characteristics. There are no recommended settings that can make your storage hardware blazing fast. The only way to find the best performance settings for your environment is… trial and error. You can also make use of available I/O benchmarks.
When synchronous mirroring is used, one node is treated as the active (source) and the other as the passive (destination). The natural consequence is that data can be accessed only on the active node. The passive node is “locked” and cannot be used until it is switched to active.
There is a way to bypass these limitations – by creating a snapshot on the destination node and accessing the data via the snapshot. However, this is only a partial solution, as the data on the snapshot is read-only. DSS V6 implementation of DRBD works in primary-secondary mode. When volume replication is enabled, both nodes have exactly the same data – they are consistent. If the secondary node failed (but was repaired, and ultimately up and running again) the DRBD will automatically synchronize the primary with the secondary node. As a result, the secondary node will contain the latest version of the data.
If both nodes were in a consistent state and the primary node failed, the data can still be accessed on the secondary node. However, in order to have read/write access to the data, the logical volumes on the secondary node need to be switched to “source” mode. When the old primary node is repaired and running again, a reverse replication task can be used to synchronize it with the secondary node (which now stores the latest version of the data). After those replications tasks are completed, there is a need to switch the modes manually to make the old primary node the source node again, and the secondary node a destination node. Running replication tasks and thus the node synchronization process is unnoticeable for all hosts using resources on this particular server (NFS, ISCSI).
To make the resynchronization possible, DRBD needs to gather a bunch of detailed information about what exactly happens on both nodes involved in the volume replication process. The gathered information is called meta data and includes:
the size of the DRBD device,
the Generation Identifier,
the Activity Log,
the quick-sync bitmap.
DSS V6 stores meta data externally, which means that it is located on a separate dedicated block device – different from where the production data is stored. The benefit of such a solution is improved latency for some write procedure operations.
DRBD uses all the information included in meta data to determine whether two nodes are in fact members of the same cluster or if they were connected by mistake, and to determine the direction of background resynchronization. It also determines whether full re-synchronization is necessary or whether partial re-synchronization is sufficient, identifying a split brain.
Although meta data is stored in a different place than production data in case of hardware failures, it is possible that one of the logical volumes may become corrupted. As a result, the synchronization process may be no longer possible when based on the current meta data. That is why, in some situations, it is necessary to clear meta data on both the source and destination logical volumes – in order to reestablish the replication connection.
DRBD® and the DRBD® logo are trademarks or registered trademarks of LINBIT® in Austria, USA and other countries.