Failover

Failover is the automatic switching of services from a failed system component to a standby component to ensure continued availability and minimal downtime.

What Is Failover?


Failover is a high-availability (HA) mechanism that ensures uninterrupted service by automatically transferring operations to a backup system component when a failure occurs.


It can apply to:

  • Network links
  • Storage controllers
  • Server nodes
  • Virtual machines
  • Applications or services

Failover helps maintain continuous uptime, even during:

  • Hardware crashes
  • Software errors
  • Power outages
  • Maintenance tasks

Types of Failover


  • Active-Passive Failover: A primary system actively handles traffic while a secondary remains on standby. When failure is detected, services are redirected to the passive node.
  • Active-Active Failover: Both nodes are operational and share workloads. If one fails, the other takes over 100% of the load—ideal for balanced performance and resilience.
  • Manual Failover: Requires administrator intervention to redirect services. Less automated but useful in controlled environments or for staged recovery.
  • Automated Failover: Uses monitoring and heartbeat mechanisms to detect failure and trigger a switchover without manual input—minimizing downtime risk.
  • Application-Level Failover: Specific services (e.g., databases, VMs) are restarted or migrated to alternate systems in response to localized crashes or overload.

Benefits of Failover


 

  • Minimizes service disruption: Users experience little to no downtime during failures, keeping applications and systems operational during critical periods.
  • Protects business continuity: Ensures access to data and systems even during maintenance, disasters, or infrastructure failures—crucial for regulated industries.
  • Reduces support response time: Automated failover eliminates the delay of manual troubleshooting, allowing IT teams to focus on root cause analysis later.
  • Enables maintenance with zero downtime: Systems can be patched or updated while services are temporarily rerouted to backup nodes or locations.
  • Strengthens disaster recovery strategies: Failover complements backups and replication, enabling near-instant access to redundant resources after a system event.

Failover in Open‑E JovianDSS


Open-E JovianDSS includes built-in high-availability clustering features that provide enterprise-class failover for storage systems:

  • Active-Passive and Active-Active clustering: Configurations allow for fully redundant nodes that automatically take over services and mount points when a failure is detected.
  • Heartbeat and watchdog monitoring: Nodes communicate constantly to verify health. If a heartbeat is lost, failover is triggered immediately to the standby node.
  • Shared storage with synchronized metadata: Both nodes have access to the same storage pool, ensuring seamless transition and up-to-date data visibility.
  • Failover of NAS, iSCSI, and Fibre Channel targets: Not just files, but block storage targets and protocols are fully supported in HA setups.
  • Email alerts and logging: All failover actions are logged and reported, allowing for later auditing and diagnostics.

Best Practices for Data Storage Failover


  • Test failover procedures regularly: Simulate failures under controlled conditions to confirm automatic switchover works as expected—and to train staff.
  • Monitor HA components proactively: Use SNMP, logs, and dashboards to detect degraded hardware or link errors before failover is triggered unnecessarily.
  • Use redundant power, cooling, and networking: Failover is most effective when supporting infrastructure also avoids single points of failure.
  • Keep firmware and OS updated: Prevent known bugs from interfering with cluster operation or HA logic by following a strict update policy.
  • Document roles and dependencies clearly: Ensure that all systems know which node handles what service—and how data is synchronized between them.


 

Further Resources


The Magic of Failover to Avoid the System Downtime

A readable explanation of failover fundamentals and how Open-E’s clustering keeps workloads running even when hardware fails.

KnowledgeBase Link

Optimizing the Non-Shared Data Storage HA Cluster

Learn how Open-E implements non-shared HA clusters using mirroring to provide redundancy without shared storage — and why mirroring may outperform RAID in that context.

KnowledgeBase Link

Open-E JovianDSS Advanced Metro High Availability Cluster Feature Pack

Describes built-in failover functionality in JovianDSS for iSCSI, FC, NFS & SMB, supporting load-balanced HA clusters across metro distances.

KnowledgeBase Link