Cluster Split-Brain part 3

Cluster Split-Brain explained [part 2]

July 10, 2018
No Comments

Here it is, the second part of the cluster split-brain article series. In case you have missed the first part, you can read it here.

The previous part finished with a Cluster Split-Brain event, so let’s start the second part from the moment when both nodes are issuing the same cluster resources.

So, we currently have two servers (nodes) that are parts of a cluster that has been split. The nodes do not “communicate” with each other but both nodes have taken over and shared the same cluster resources – ShareX and VIP1. Clients have saved various data independently and we cannot connect them.

And now! starts the mechanism that protects your whole environment against data loss after the cluster split-brain. In case resources have been taken over and communication with a remote node is lost, each node recognizes itself as separated. This kind of recognition is used when restoring the communication between nodes. Nevertheless, we will consider what would happen if the mechanism was not included.

Imagine the connection path between cluster path / mirror path returns. Both nodes have Pool0 imported, so both of them start synchronizing their data for the other node while overwriting the data that is already on it AT THE SAME TIME! It’s a recipe for disaster, as the data cannot be recovered. Even if we turn off node A before fixing the connection between nodes, we will repair the cluster path / mirror path and just start node A, it will join the cluster and put its disks under the control of node B, as it manages the current Pool0. The data from node A will be overwritten by the data from node B. Which is also a disaster but a minor one because we “only” lost data from node A and both nodes now have data belonging to node B. An analogical situation happens if node B is turned off.

So what will cause our separation mechanism?

As we already know, thanks to this mechanism each node was marked on its side as separated. After fixing the cluster path / mirror path line between the nodes, they first check the state of the second node. If both nodes are in separated mode, none of them will share their disks with the other node in order to prevent overwriting, and thus data loss. Even if the node had been switched off before the communication line was repaired, and then the line was repaired and switched on again, it still has a separated state on its side. When connecting to a cluster, it will also detect the separated mode on the remote node and will not share its disks, which means the data is secure.

Okay, but what can the client do?

Well, a lot actually. The user should check the data on both nodes on his own and, for example make a copy of data from node A (the data which are not on node B). Next, he or she should force sharing disks from node A to node B and vice versa – this is possible thanks to a special functionality in the Open-E JovianDSS webGUI. After this operation, node B will overwrite the data on node A, and the client can set up the data which node B did not have from the previously created backup.

But where is this mechanism that prevents split brain? And why it does not work in this case? Can it be more secure?

The answers to all these questions will be revealed in part 3 of this series!

cluster Cluster Split-Brain clustering data protection data security data synchronization node Open-E JovianDSS split brain

Janusz Bak

Chief Technology Officer

Janusz Bak joined Open-E in 1999 and has been serving as Open-E's CTO ever since. Janusz has over 30 years of software engineering experience and is a recognized expert on storage technologies. Before Open-E, Janusz headed up German support operations at Aztech Systems and at Mega.

Leave a Comment

Featured Posts

Optimizing Data Storage Costs & Efficiency with Open-E JovianDSS

In today’s data-driven world, the importance of optimizing data storage cannot be overstated. As data continues to grow at an unprecedented rate, businesses face significant challenges in managing, storing, and ...

Data Storage Monitoring in Open-E JovianDSS with Checkmk and Diagnostic Tools

Among the characteristics of an optimal data storage solution, several features should stand out. It should provide full checksumming, self-repair, and backup and restore capabilities with short RPOs and RTOs. ...

How To Improve Your Business With ZFS

The smooth workflow of almost any business today is mainly based on data management. Media, transportation and logistics, finance, the public, government, or medical sectors – basically, you can list ...

Welcome to Open-Experts — The Data Storage Podcast!

Our charismatic host, Todd Maxwell, with almost 20 years of experience in the data storage market, delves into the world of data storage solutions. Learn about key trends, technologies, and ...

Want to Learn More?

Open-E Data Storage Calculator page

3-in-1 Complete Data Storage Solution

Accelerate Your Data Storage with ZFS-based Storage System

Start 60 Day FREE TRIAL

Open-E data storage calculator tabs

Find the Exact License for Your Storage Setup

This calculator helps you to find the exact license required for your storage setup with Open-E JovianDSS, based on your individual specification.

Enter the configuration of your choice into the calculator and generate a PDF report.

Try the Calculator

Open-E Library

Manuals and Quick Starts

How-to Resources

Video Tutorials

Courses