Open-E magic of failover - it admin server room

The Magic of Failover or How to Avoid the System Downtime

August 06, 2024
No Comments

While every data storage provider is concerned about protection by means of encryption, vividly describing the topics of ransomware, malware, and other kinds of possible hacker attacks, we tend to forget to beware of the natural disasters that can happen. By natural disasters, we mean possible downtimes caused by fires, floods, and any other kind of simple system failures, which can happen regardless of how secure your solution is. Let’s see some of the statistics:

The AFR – annualized failure rate – for hard drives since the beginning of 2023 looks like this:
- - the AFR in 2023 reached:
    - for Q1 2023 – 1.54%,
    - for Q2 2023 – 2.28%,
    - for Q3 2023 – 1.47%,
    - for Q4 2023 – 1,53%,
  - the AFR in 2024 reached:
    - for Q1 2024 – 1.41%.

75% of small businesses do not have a disaster recovery plan.

44% of enterprises indicate that hourly downtime costs exceed $1 million to over $5 million.

91% of organizations said a single hour of downtime averages over $300,000.

In 60% of the cases, the reason for data loss is a hardware failure.

So what can your company do to ensure the constant availability of the data stored on a server? In the case of
Open-E JovianDSS failover functionality available in cluster configurations, there is an opportunity to avoid downtimes in the workload.

In this article, we will try to share the knowledge about failover in simple terms, describing it in a form of answers to two essential questions:

What is failover?
When is failover possible?

Let’s get deeper into the topic.

What is failover?

Before describing it in real life, let’s look at a more technical definition of failover. Here’s one from TechTarget:

Failover is a backup operational mode in which the functions of a system component are assumed by a secondary component when the primary becomes unavailable. An organization can fail over either after failure or during scheduled down time.
Failover is an integral part of mission-critical systems and often a key component of disaster recovery.

Well, it doesn’t sound so simple, right? Let us explain it with an example that is easy to imagine. Assume you keep the memorable moments of your life in the camera roll of your cellphone, so you can browse photos just anytime. In addition to that, you have a backup copy of those pictures or videos on your computer. This way, if your cellphone breaks, drowns, or does anything else you don’t wish to happen to your device, you will still have a copy of the camera roll and have access to it on the computer. Moreover, when you repair the phone or buy a new one, you can download the camera roll record back.

Failover works similarly. It allows you to switch the activity from one node to the other in case of any kind of failure and backward when the failure is fixed. In order to refill the failed node with the data stored on the other one, the system initiates such an action as resilvering, which compares the state of both systems and adjusts it to the latest version. Let me be more precise on this one. All data is a sum of binary codes, so your data storage solution will compare those sums on both nodes (checksumming) and initiate the resilvering process to make the data stored on both systems equal.

When is failover possible?

As you have probably guessed from the description of the failover mentioned above, it takes two nodes to make it possible. Or, as a data storage specialist would say – a cluster. However, suppose you consider failover in a “cluster-in-a-box” configuration. In that case, you have to remember that the nodes are kept within the same system and will not save you from the power failure of the server, flood, or burn down (but if the disks fail, failover will be the best shot).

In Open-E JovianDSS, there are several cluster options that can ensure the constant availability of data: shared storage cluster configurations with Fibre Channel protocol (as its length can be even up to 300 km if the low latency factor on the required size is met) and non-shared data storage clusters with Ethernet connections. Thanks to such a long distance the second node in the cluster can be located in a different building or even a city (generalizing: the off-site location). That is why Open-E JovianDSS On- and Off-site Data Protection will keep your data constantly protected and available no matter what kind of hardware failure or natural disaster may take place.

As for the data replication, there are alternative cluster configurations. In Open-E JovianDSS, you can choose between two options: active-active and active-passive cluster. The terms speak for themselves: the active-active configuration has both nodes working as the production ones, ensuring the load balancing of the data recorded (making the read and write performance faster by having both parts of the cluster involved in the active performance); the active-passive configuration keeps one of the nodes as a “redundant medium” of the other one simply by copying the state of the active node.

Other key features of the Open-E JovianDSS High-Availability Cluster are:

Multiple Protocols – the cluster supports SMB, NFS, iSCSI, and FC protocols.
Multiple Storage Types – it supports two server nodes connected to a single storage, and one or more JBODs. It also supports SAS and FC SSD/HDD disks.
Advanced Cluster Management Software that enables quick access to all features related to the cluster setup.
The Independent Virtual IP (VIP) Feature creates a connection to the data that is available regardless of which node is active at a given time
Automatic Failover – high availability is achieved by detecting hardware failures and automatically moving the VIP from the primary to the secondary node without the client servers noticing a timeout.

In order to see how the failover functionality in Open-E JovianDSS works, check out the video below:

Janusz Bak

Chief Technology Officer

Janusz Bak joined Open-E in 1999 and has been serving as Open-E's CTO ever since. Janusz has over 30 years of software engineering experience and is a recognized expert on storage technologies. Before Open-E, Janusz headed up German support operations at Aztech Systems and at Mega.

Leave a Comment

Featured Posts

Optimizing Data Storage Costs & Efficiency with Open-E JovianDSS

In today’s data-driven world, the importance of optimizing data storage cannot be overstated. As data continues to grow at an unprecedented rate, businesses face significant challenges in managing, storing, and ...

Data Storage Monitoring in Open-E JovianDSS with Checkmk and Diagnostic Tools

Among the characteristics of an optimal data storage solution, several features should stand out. It should provide full checksumming, self-repair, and backup and restore capabilities with short RPOs and RTOs. ...

How To Improve Your Business With ZFS

The smooth workflow of almost any business today is mainly based on data management. Media, transportation and logistics, finance, the public, government, or medical sectors – basically, you can list ...

Welcome to Open-Experts — The Data Storage Podcast!

Our charismatic host, Todd Maxwell, with almost 20 years of experience in the data storage market, delves into the world of data storage solutions. Learn about key trends, technologies, and ...

Want to Learn More?

Open-E Data Storage Calculator page

3-in-1 Complete Data Storage Solution

Accelerate Your Data Storage with ZFS-based Storage System

Start 60 Day FREE TRIAL

Open-E data storage calculator tabs

Find the Exact License for Your Storage Setup

This calculator helps you to find the exact license required for your storage setup with Open-E JovianDSS, based on your individual specification.

Enter the configuration of your choice into the calculator and generate a PDF report.

Try the Calculator

Open-E Library

Manuals and Quick Starts

How-to Resources

Video Tutorials

Courses