Replication, as a way of creating an exact data copy between the source storage location and the target location, is often erroneously put into one bag with backup. Despite of the fact that in both cases we can talk about ‘data copies’ – backup and replication cannot be treated as synonyms.
Among many technologies that increase data storage protection, the term ‘replication’ is used very frequently.
When you use block or file-based replication, a mirrored copy of the data is created in a location different from the parent data. In case of file-based replication there is an option to keep deleted files on the destination location. Also, a limited versioning can be achieved with file-based replication. It can be done with multiple replication tasks by copying the source data into a few different locations. For example, every Monday the replication task will copy the data to the first sub-directory, Tuesday to the second, etc. With such setup, 7 days behind will be preserved but the disadvantage is significantly much higher storage space required. BTW, this may be fixed with deduplication.
Depending on the software, the target location can be usually found on the remote computer, on a portable disk connected to the local computer or, in case of file-based replication, just a directory (e.g. a mounted NFS or SMB).
Now, let’s analyze three days from the life of an administrator who wants to perform a backup through replication.
Diary of an administrator: Day 1
Today, I used replication to create a mirrored copy of the data, and now I can enjoy the fact that if my source file fails – I have a copy of all data on the backup server. Unfortunately, on the same day I received a notification from the secretary, Mrs. Christie, that she unintentionally deleted important files. I took it easy because after all, I had the data copy. Imagine my disappointment after I searched for the desired files on the backup server and found nothing. ;-(
During data replication all data is being synchronized. In the situation described above, the administrator did not find a copy of the deleted files from the source server, because once they were deleted by Mrs. Christie, the absence of those files was also replicated to the target server. In other words, in case of block-based replication , the deletion of the files from the source server also resulted in the removal from the target server. Moreover, in case of block replication, any errors that occur in the hardware layer (e.g. damaged disk) can result in damage of the file system and are replicated to the backup server. In the end, you can accidentally become an owner of two damaged and useless mirrored copies. This situation also applies to the file-based replication, which in case of hardware layer or file system errors means replication of the damaged files. Let’s take a look at the next note from the diary of our unlucky administrator.
Diary of an administrator: Day 2
Thankfully, the company didn’t collapse because of my inability to recover the lost files. But today there was yet another disaster. Mrs. Christie from the secretary’s office asked for a copy of her files from two weeks ago. Ugh!
Yes, replication by itself does not perform data versioning, but just simply copies the changes from one place to another. Multiple file versioning is the domain of the traditional backup or snapshot technology. It can be relatively easy to achieve replicated data versioning if we combine replication with one of the technologies (or both). For example, when snapshots are performed within a one-hour interval, it is possible to restore your data to the state of the required time. On the other hand, when we use backup combined with replication or/and snapshots, the leeway tends to expand even more.
Diary of an administrator: Day 3
I was not able to explain to Mrs. Christie that I don’t have a copy of files from two weeks ago. She wasn’t happy. On top of that, the main data server gave up together with all the data. So, I swapped the broken master server for the backup server, which contained a previously replicated copy of the data. Everybody breathed a sigh of relief that the data has become available again.
And that is how the story of our administrator ends. Of course, it has been presented in an oversimplified way. Pleased with a ‘happy end’ of this story, I shall move on to some conclusions.
Replication can be compared (assuming a constant synchronization) to a kind of RAID 1 network. In case of a failure of one of the replicated servers – replication can protect us from data loss. However, it is not a backup. Nonetheless, we can ‘arm’ the replicated data during backup performance. Thanks to this, we gain fully operable copies of the data from the server and a mirror copy of all data on two servers.
In future articles you will be able to read about using simple file-based replication with file versioning.