Erasure Coding

Erasure coding is a data protection method that breaks data into fragments, adds redundancy through parity blocks, and distributes them across multiple drives or systems for fault tolerance and storage efficiency.

What Is Erasure Coding?


Erasure coding is a mathematical technique for data redundancy that splits original data into smaller fragments, generates parity fragments, and distributes all pieces across multiple storage devices or nodes.
If some fragments are lost or corrupted, the original data can still be reconstructed from the remaining pieces, as long as a threshold number is preserved.


Typical erasure coding notations:
k + m, where
k = number of data blocks
m = number of parity blocks
→ Example: 6+3 means 9 total blocks (6 data + 3 parity), any 6 needed to reconstruct

How Erasure Coding Works


  • Data is split into fragments: A file or dataset is divided into equally sized blocks, depending on the system’s configuration (e.g., k = 4 blocks).
  • Parity is generated using algorithms: Mathematical formulas produce extra blocks (m) that provide redundancy and allow reconstruction if original blocks are lost.
  • All blocks are distributed: The k + m blocks are spread across different disks, servers, or racks to tolerate hardware failures or outages.
  • Reconstruction is dynamic: When data is read, it is rebuilt in memory using available fragments—even if some drives are offline or corrupted.

Benefits of Erasure Coding


  • High fault tolerance with less overhead than mirroring: Compared to 3-way mirrors, erasure coding uses significantly less extra storage while still protecting against multiple simultaneous failures.
  • Efficient use of storage capacity: Unlike RAID 10, erasure coding offers strong protection without duplicating all data—ideal for cost-sensitive deployments with large volumes.
  • Geographic resilience in distributed systems: Because blocks are spread across nodes or sites, erasure coding supports global redundancy and disaster recovery use cases.
  • Suitable for object and cold data storage: Works well with object stores like Ceph, MinIO, or S3 backends where performance is less critical than availability and durability.

Drawbacks and Considerations


  • Higher CPU and memory usage: Encoding and decoding require compute resources, which can affect performance on systems with limited processing capacity.
  • Increased latency for small or random reads: Reconstructing data on the fly introduces delays, making it less suitable for transactional or latency-sensitive workloads.
  • Not ideal for high-performance workloads: Systems with intensive read/write IOPS may be better served by RAID or mirrored storage unless optimized hardware is available.
  • Complexity in configuration and recovery: Understanding and managing k+m schemes, failure domains, and rebalance operations requires specialized knowledge or automation.

Erasure Coding and Open‑E JovianDSS


Currently, Open-E JovianDSS does not support erasure coding natively. Instead, it uses ZFS-based RAID levels (RAID-Z1/Z2/Z3 and mirrors) for redundancy and data protection.

These technologies provide:

  • Self-healing and checksummed storage
  • High performance for both random and sequential I/O
  • Inline compression, deduplication, and snapshots
  • Efficient block-level replication for offsite protection


While erasure coding is a feature in some object storage systems or hyperscale infrastructures, Open-E JovianDSS focuses on enterprise-grade storage performance and reliability with mature ZFS capabilities.