I guess that all of us know that caching the data usually increase the performance, but I worried if all of us are aware about risk that caching the data provides and how to minimize them? That’s the reason why I decided to write this short article about it.
Therefore, let us analyze the situation when we are using device (HDD) connected to our OS via iSCSI. Server that provides this device as LU through iSCSI has a RAID controller. Connected LU to our OS called device has been formatted using NTFS. So a simple communication scheme will therefore look at this way: client OS (device formatted with NTFS) <-> iSCSI initiator <-> iSCSI target <-> RAID controller.
Now we can closer look into the RAID controller configuration. Most of hardware RAID controller provides caching – please keep in mind that we analyze configuration where volatile memory RAM is used for cache. We could meet many type of names of such functionality calls Write Back (WB) cache, Unit Write Cache or just Cache. Unfortunately a lot of RAID controllers has this function even if they doesn’t have BBU (Battery Backup Unit). What for BBU is necessary in situation while cache is used on RAID controller level? Let see: OS write the data to device connected through iSCSI and waits for confirmation that operation has finished successfully. iSCSI initiator sends the data via LUN to the iSCSI target which sends the data to the RAID device. This is the climax point because iSCSI target get confirmation from RAID controller that the data has been written successfully and send this information back to the iSCSI initiator which send this information back to the OS. However, in this case, the data are not yet on disk drives connected to the RAID controller but in the cache. So if in this very moment we will face problems with power supply then we will lose the data. To minimize risk of losing data in described case it is necessary to use UPS for whole server machine and best will be to use RAID controller with BBU. Thus maximizes data protection without sacrificing performance.
The second level of described scenario where cache could be used and potentially could provide some risk of data lost is configuration of LU in iSCSI target. Similar to RAID configuration we are able to set up WB in LU configuration while adding it to iSCSI target or other way turn off Write Through (WT which is in opposite to WB). Write sequence and waiting for confirmation will be similar but shorter: OS write the data to device connected through iSCSI and waits for confirmation that operation has finished successfully. iSCSI initiator sends the data via LUN to the iSCSI target which automatically sends back confirmation to iSCSI initiator that write operation has finished successfully. In this case only UPS can minimize risk with data lost.
Of course, I will not describe here about combination of using redundancy of power supply or UPS because this is not the goal of this article.
Lets closer look into the OS device formatted with NTFS and connected to this system through iSCSI initiator. Few times I have faced problem mentioned by our customer that they have written the data into device and after it create snapshot on server and makes backup of this snapshot to the tape. After few months they couldn’t find changed data on tapes ! This is because NTFS as other filesystems uses cache which is dropped to disk on every few seconds. So we have at least two solutions here. First is to wait few second before starts to make backup of LU on server side or the second option is to use software for dropping NTFS cache into device on demand, such software you can find here. If we are using Linux/UNIX OS and other filesystem with similar iSCSI environment as described above we can use provided by system utility sync to get the same result.
Conclusion of this article is to always analyze potential risks of data lost and minimize it as much as possible by using alternative power of source and always be sure that important data which must be backup are consistent. Good luck!