There is plenty of talk about Bonding and Multipath IO, but it is very difficult to get solid information about either one. Typically what documentation can be found is very bulky and the most important practical questions go unanswered.
As a result the following questions are often heard:
When should I use bonding and when should I use multipath?
I was expecting better throughput with bonding, why am I not seeing this?
My RAID array shows 400MB/sec with local test, how can I get 400/MB sec outside?
Before we answer the above questions, let’s understand first how MPIO and bonding works.
MPIO allows a server with multiple NICs to transmit and receive I/O across all available interfaces to a corresponding MPIO-enabled server. If a server has two 1Gb NICs and the storage server has two 1Gb NICs, the theoretical maximum throughput would be about 200 MB/s.
Link aggregation (LACP, 802.3ad, etc.) via NIC teaming does not work the same way as MPIO. Link aggregation does not improve the throughput of a single I/O flow. A single flow will always traverse only one path. The benefit of link aggregation is seen when several “unique” flows exist, each from different source. Each individual flow will be sent down its own available NIC interface which is determined by a hash algorithm. Thus with more unique flows, more NICs will provide greater aggregate throughput. Link aggregation will not provide improved throughput for iSCSI, although it does provide a degree of redundancy.
Bonding works between a server and switch. Numerous workstations using each using a single NIC connected to the switch will benefit from bonded connections between the switch and storage server.
MPIO works between a storage server and the client server, whether or not there is a switch in the path.
With these basics fact, it will now be easier to answer our questions.
Q: When do I need bonding, and when is multipath appropriate?
A: Bonding works for a NAS server with multiple workstations connected.
Q: I was expecting better throughput with bonding, in my case why does it not work?
A: Let us consider 4 workstations and a storage server. Bonding will increase performance only if the single 1 GB connection between the storage server and switch is a bottleneck. This can happen when total aggregate benchmark of all four workstations is a bit above 100 MB/sec. In such a case it is possible that the RAID array can move more data and the bottleneck is the single 1Gb NIC. If we add a second NIC and create bonding the total aggregate performance may improve. This will happen if the RAID array is fast enough to move more than 100MB/sec with four streams. However, as more streams are active, the RAID array performance may drop because the performance test pattern will be closer to random and not sequential. The sequential test pattern is valid for one single data stream only. In a sequential pattern performance test we can observe a very high data rate, but in random patterns it can be very low. This is because the hard disks need to make frequent seeks. In practical applications it is very difficult to make effective use of more than 2 NICs for bonding.
Q: My RAID array shows 400MB/sec with a local sequential benchmark test, how can I get 400/MB sec outside?
A: It is rather impossible to get these kinds of results with NAS and bonding, because bonding works with multiple workstations and this will result in predominantly random patterns. The same local test but with random patterns may drop well below 100MB/s and then even a single NIC will be not the bottleneck – instead it will be the RAID array (hard disks). With MPIO we can keep sequential pattern performance because it can operate with a single data flow. Thus, with 2 NICs in MPIO we can observe 200 MB/sec, with 4 NIC’s, 400MB/sec.