1
2                   Linux Ethernet Bonding Driver mini-howto
3
4Initial release : Thomas Davis <tadavis at lbl.gov>
5Corrections, HA extensions : 2000/10/03-15 :
6  - Willy Tarreau <willy at meta-x.org>
7  - Constantine Gavrilov <const-g at xpert.com>
8  - Chad N. Tindel <ctindel at ieee dot org>
9  - Janice Girouard <girouard at us dot ibm dot com>
10  - Jay Vosburgh <fubar at us dot ibm dot com>
11
12Note :
13------
14The bonding driver originally came from Donald Becker's beowulf patches for
15kernel 2.0. It has changed quite a bit since, and the original tools from
16extreme-linux and beowulf sites will not work with this version of the driver.
17
18For new versions of the driver, patches for older kernels and the updated
19userspace tools, please follow the links at the end of this file.
20
21
22Table of Contents
23=================
24
25Installation
26Bond Configuration
27Module Parameters
28Configuring Multiple Bonds
29Switch Configuration
30Verifying Bond Configuration
31Frequently Asked Questions
32High Availability
33Promiscuous Sniffing notes
348021q VLAN support
35Limitations
36Resources and Links
37
38
39Installation
40============
41
421) Build kernel with the bonding driver
43---------------------------------------
44For the latest version of the bonding driver, use kernel 2.4.12 or above
45(otherwise you will need to apply a patch).
46
47Configure kernel with `make menuconfig/xconfig/config', and select "Bonding
48driver support" in the "Network device support" section. It is recommended
49to configure the driver as module since it is currently the only way to
50pass parameters to the driver and configure more than one bonding device.
51
52Build and install the new kernel and modules.
53
542) Get and install the userspace tools
55--------------------------------------
56This version of the bonding driver requires updated ifenslave program. The
57original one from extreme-linux and beowulf will not work. Kernels 2.4.12
58and above include the updated version of ifenslave.c in Documentation/network
59directory. For older kernels, please follow the links at the end of this file.
60
61IMPORTANT!!!  If you are running on Redhat 7.1 or greater, you need
62to be careful because /usr/include/linux is no longer a symbolic link
63to /usr/src/linux/include/linux.  If you build ifenslave while this is
64true, ifenslave will appear to succeed but your bond won't work.  The purpose
65of the -I option on the ifenslave compile line is to make sure it uses
66/usr/src/linux/include/linux/if_bonding.h instead of the version from
67/usr/include/linux.
68
69To install ifenslave.c, do:
70    # gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave
71    # cp ifenslave /sbin/ifenslave
72
73
74Bond Configuration
75==================
76
77You will need to add at least the following line to /etc/modules.conf
78so the bonding driver will automatically load when the bond0 interface is
79configured. Refer to the modules.conf manual page for specific modules.conf
80syntax details. The Module Parameters section of this document describes each
81bonding driver parameter.
82
83	alias bond0 bonding
84
85Use standard distribution techniques to define the bond0 network interface. For
86example, on modern Red Hat distributions, create an ifcfg-bond0 file in
87the /etc/sysconfig/network-scripts directory that resembles the following:
88
89DEVICE=bond0
90IPADDR=192.168.1.1
91NETMASK=255.255.255.0
92NETWORK=192.168.1.0
93BROADCAST=192.168.1.255
94ONBOOT=yes
95BOOTPROTO=none
96USERCTL=no
97
98(use appropriate values for your network above)
99
100All interfaces that are part of a bond should have SLAVE and MASTER
101definitions. For example, in the case of Red Hat, if you wish to make eth0 and
102eth1 a part of the bonding interface bond0, their config files (ifcfg-eth0 and
103ifcfg-eth1) should resemble the following:
104
105DEVICE=eth0
106USERCTL=no
107ONBOOT=yes
108MASTER=bond0
109SLAVE=yes
110BOOTPROTO=none
111
112Use DEVICE=eth1 in the ifcfg-eth1 config file. If you configure a second
113bonding interface (bond1), use MASTER=bond1 in the config file to make the
114network interface be a slave of bond1.
115
116Restart the networking subsystem or just bring up the bonding device if your
117administration tools allow it. Otherwise, reboot. On Red Hat distros you can
118issue `ifup bond0' or `/etc/rc.d/init.d/network restart'.
119
120If the administration tools of your distribution do not support
121master/slave notation in configuring network interfaces, you will need to
122manually configure the bonding device with the following commands:
123
124    # /sbin/ifconfig bond0 192.168.1.1 netmask 255.255.255.0 \
125      broadcast 192.168.1.255 up
126
127    # /sbin/ifenslave bond0 eth0
128    # /sbin/ifenslave bond0 eth1
129
130(use appropriate values for your network above)
131
132You can then create a script containing these commands and place it in the
133appropriate rc directory.
134
135If you specifically need all network drivers loaded before the bonding driver,
136adding the following line to modules.conf will cause the network driver for
137eth0 and eth1 to be loaded before the bonding driver.
138
139probeall bond0 eth0 eth1 bonding
140
141Be careful not to reference bond0 itself at the end of the line, or modprobe
142will die in an endless recursive loop.
143
144If running SNMP agents, the bonding driver should be loaded before any network
145drivers participating in a bond. This requirement is due to the the interface
146index (ipAdEntIfIndex) being associated to the first interface found with a
147given IP address. That is, there is only one ipAdEntIfIndex for each IP
148address. For example, if eth0 and eth1 are slaves of bond0 and the driver for
149eth0 is loaded before the bonding driver, the interface for the IP address
150will be associated with the eth0 interface. This configuration is shown below,
151the IP address 192.168.1.1 has an interface index of 2 which indexes to eth0
152in the ifDescr table (ifDescr.2).
153
154     interfaces.ifTable.ifEntry.ifDescr.1 = lo
155     interfaces.ifTable.ifEntry.ifDescr.2 = eth0
156     interfaces.ifTable.ifEntry.ifDescr.3 = eth1
157     interfaces.ifTable.ifEntry.ifDescr.4 = eth2
158     interfaces.ifTable.ifEntry.ifDescr.5 = eth3
159     interfaces.ifTable.ifEntry.ifDescr.6 = bond0
160     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5
161     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
162     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4
163     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
164
165This problem is avoided by loading the bonding driver before any network
166drivers participating in a bond. Below is an example of loading the bonding
167driver first, the IP address 192.168.1.1 is correctly associated with
168ifDescr.2.
169
170     interfaces.ifTable.ifEntry.ifDescr.1 = lo
171     interfaces.ifTable.ifEntry.ifDescr.2 = bond0
172     interfaces.ifTable.ifEntry.ifDescr.3 = eth0
173     interfaces.ifTable.ifEntry.ifDescr.4 = eth1
174     interfaces.ifTable.ifEntry.ifDescr.5 = eth2
175     interfaces.ifTable.ifEntry.ifDescr.6 = eth3
176     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6
177     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2
178     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5
179     ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1
180
181While some distributions may not report the interface name in ifDescr,
182the association between the IP address and IfIndex remains and SNMP
183functions such as Interface_Scan_Next will report that association.
184
185
186Module Parameters
187=================
188
189Optional parameters for the bonding driver can be supplied as command line
190arguments to the insmod command. Typically, these parameters are specified in
191the file /etc/modules.conf (see the manual page for modules.conf). The
192available bonding driver parameters are listed below. If a parameter is not
193specified the default value is used. When initially configuring a bond, it
194is recommended "tail -f /var/log/messages" be run in a separate window to
195watch for bonding driver error messages.
196
197It is critical that either the miimon or arp_interval and arp_ip_target
198parameters be specified, otherwise serious network degradation will occur
199during link failures.
200
201arp_interval
202
203        Specifies the ARP monitoring frequency in milli-seconds.
204        If ARP monitoring is used in a load-balancing mode (mode 0 or 2), the
205        switch should be configured in a mode that evenly distributes packets
206        across all links - such as round-robin. If the switch is configured to
207        distribute the packets in an XOR fashion, all replies from the ARP
208        targets will be received on the same link which could cause the other
209        team members to fail. ARP monitoring should not be used in conjunction
210        with miimon. A value of 0 disables ARP monitoring. The default value
211        is 0.
212
213arp_ip_target
214
215	Specifies the ip addresses to use when arp_interval is > 0. These
216	are the targets of the ARP request sent to determine the health of
217	the link to the targets. Specify these values in ddd.ddd.ddd.ddd
218	format. Multiple ip adresses must be seperated by a comma. At least
219	one ip address needs to be given for ARP monitoring to work. The
220	maximum number of targets that can be specified is set at 16.
221
222downdelay
223
224        Specifies the delay time in milli-seconds to disable a link after a
225        link failure has been detected. This should be a multiple of miimon
226        value, otherwise the value will be rounded. The default value is 0.
227
228lacp_rate
229
230        Option specifying the rate in which we'll ask our link partner to
231	transmit LACPDU packets in 802.3ad mode.  Possible values are:
232
233	slow or 0
234		Request partner to transmit LACPDUs every 30 seconds (default)
235
236	fast or 1
237		Request partner to transmit LACPDUs every 1 second
238
239max_bonds
240
241	Specifies the number of bonding devices to create for this
242	instance of the bonding driver.  E.g., if max_bonds is 3, and
243	the bonding driver is not already loaded, then bond0, bond1
244	and bond2 will be created.  The default value is 1.
245
246miimon
247
248        Specifies the frequency in milli-seconds that MII link monitoring
249        will occur. A value of zero disables MII link monitoring. A value
250        of 100 is a good starting point. See High Availability section for
251        additional information. The default value is 0.
252
253mode
254
255	Specifies one of the bonding policies. The default is
256	round-robin (balance-rr).  Possible values are (you can use
257	either the text or numeric option):
258
259	balance-rr or 0
260
261		Round-robin policy: Transmit in a sequential order
262		from the first available slave through the last. This
263		mode provides load balancing and fault tolerance.
264
265	active-backup or 1
266
267		Active-backup policy: Only one slave in the bond is
268		active. A different slave becomes active if, and only
269		if, the active slave fails. The bond's MAC address is
270		externally visible on only one port (network adapter)
271		to avoid confusing the switch.  This mode provides
272		fault tolerance.
273
274	balance-xor or 2
275
276		XOR policy: Transmit based on [(source MAC address
277		XOR'd with destination MAC address) modula slave
278		count]. This selects the same slave for each
279		destination MAC address. This mode provides load
280		balancing and fault tolerance.
281
282	broadcast or 3
283
284		Broadcast policy: transmits everything on all slave
285		interfaces. This mode provides fault tolerance.
286
287	802.3ad or 4
288
289		IEEE 802.3ad Dynamic link aggregation. Creates aggregation
290		groups that share the same speed and duplex settings.
291		Transmits and receives on all slaves in the active
292		aggregator.
293
294		Pre-requisites:
295
296		1. Ethtool support in the base drivers for retrieving the
297		speed and duplex of each slave.
298
299		2. A switch that supports IEEE 802.3ad Dynamic link
300		aggregation.
301
302	balance-tlb or 5
303
304		Adaptive transmit load balancing: channel bonding that does
305		not require any special switch support. The outgoing
306		traffic is distributed according to the current load
307		(computed relative to the speed) on each slave. Incoming
308		traffic is received by the current slave. If the receiving
309		slave fails, another slave takes over the MAC address of
310		the failed receiving slave.
311
312		Prerequisite:
313
314		Ethtool support in the base drivers for retrieving the
315		speed of each slave.
316
317	balance-alb or 6
318
319		Adaptive load balancing: includes balance-tlb + receive
320		load balancing (rlb) for IPV4 traffic and does not require
321		any special switch support. The receive load balancing is
322		achieved by ARP negotiation. The bonding driver intercepts
323		the ARP Replies sent by the server on their way out and
324		overwrites the src hw address with the unique hw address of
325		one of the slaves in the bond such that different clients
326		use different hw addresses for the server.
327
328		Receive traffic from connections created by the server is
329		also balanced. When the server sends an ARP Request the
330		bonding driver copies and saves the client's IP information
331		from the ARP. When the ARP Reply arrives from the client,
332		its hw address is retrieved and the bonding driver
333		initiates an ARP reply to this client assigning it to one
334		of the slaves in the bond. A problematic outcome of using
335		ARP negotiation for balancing is that each time that an ARP
336		request is broadcasted it uses the hw address of the
337		bond. Hence, clients learn the hw address of the bond and
338		the balancing of receive traffic collapses to the current
339		salve. This is handled by sending updates (ARP Replies) to
340		all the clients with their assigned hw address such that
341		the traffic is redistributed. Receive traffic is also
342		redistributed when a new slave is added to the bond and
343		when an inactive slave is re-activated. The receive load is
344		distributed sequentially (round robin) among the group of
345		highest speed slaves in the bond.
346
347		When a link is reconnected or a new slave joins the bond
348		the receive traffic is redistributed among all active
349		slaves in the bond by intiating ARP Replies with the
350		selected mac address to each of the clients. The updelay
351		modeprobe parameter must be set to a value equal or greater
352		than the switch's forwarding delay so that the ARP Replies
353		sent to the clients will not be blocked by the switch.
354
355		Prerequisites:
356
357		1. Ethtool support in the base drivers for retrieving the
358		speed of each slave.
359
360		2. Base driver support for setting the hw address of a
361		device also when it is open. This is required so that there
362		will always be one slave in the team using the bond hw
363		address (the curr_active_slave) while having a unique hw
364		address for each slave in the bond. If the curr_active_slave
365		fails it's hw address is swapped with the new curr_active_slave
366		that was chosen.
367
368primary
369
370        A string (eth0, eth2, etc) to equate to a primary device. If this
371        value is entered, and the device is on-line, it will be used first
372        as the output media. Only when this device is off-line, will
373        alternate devices be used. Otherwise, once a failover is detected
374        and a new default output is chosen, it will remain the output media
375        until it too fails. This is useful when one slave was preferred
376        over another, i.e. when one slave is 1000Mbps and another is
377        100Mbps. If the 1000Mbps slave fails and is later restored, it may
378        be preferred the faster slave gracefully become the active slave -
379        without deliberately failing the 100Mbps slave. Specifying a
380        primary is only valid in active-backup mode.
381
382updelay
383
384        Specifies the delay time in milli-seconds to enable a link after a
385        link up status has been detected. This should be a multiple of miimon
386        value, otherwise the value will be rounded. The default value is 0.
387
388use_carrier
389
390        Specifies whether or not miimon should use MII or ETHTOOL
391        ioctls vs. netif_carrier_ok() to determine the link status.
392        The MII or ETHTOOL ioctls are less efficient and utilize a
393        deprecated calling sequence within the kernel.  The
394        netif_carrier_ok() relies on the device driver to maintain its
395        state with netif_carrier_on/off; at this writing, most, but
396        not all, device drivers support this facility.
397
398        If bonding insists that the link is up when it should not be,
399        it may be that your network device driver does not support
400        netif_carrier_on/off.  This is because the default state for
401        netif_carrier is "carrier on." In this case, disabling
402        use_carrier will cause bonding to revert to the MII / ETHTOOL
403        ioctl method to determine the link state.
404
405        A value of 1 enables the use of netif_carrier_ok(), a value of
406        0 will use the deprecated MII / ETHTOOL ioctls.  The default
407        value is 1.
408
409
410Configuring Multiple Bonds
411==========================
412
413If several bonding interfaces are required, either specify the max_bonds
414parameter (described above), or load the driver multiple times.  Using
415the max_bonds parameter is less complicated, but has the limitation that
416all bonding instances created will have the same options.  Loading the
417driver multiple times allows each instance of the driver to have differing
418options.
419
420For example, to configure two bonding interfaces, one with mii link
421monitoring performed every 100 milliseconds, and one with ARP link
422monitoring performed every 200 milliseconds, the /etc/conf.modules should
423resemble the following:
424
425alias bond0 bonding
426alias bond1 bonding
427
428options bond0 miimon=100
429options bond1 -o bonding1 arp_interval=200 arp_ip_target=10.0.0.1
430
431Configuring Multiple ARP Targets
432================================
433
434While ARP monitoring can be done with just one target, it can be useful
435in a High Availability setup to have several targets to monitor. In the
436case of just one target,  the target itself may go down or have a problem
437making it unresponsive to ARP requests. Having an additional target (or
438several) increases the reliability of the ARP monitoring.
439
440Multiple ARP targets must be seperated by commas as follows:
441
442# example options for ARP monitoring with three targets
443alias bond0 bonding
444options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9
445
446For just a single target the options would resemble:
447
448# example options for ARP monitoring with one target
449alias bond0 bonding
450options bond0 arp_interval=60 arp_ip_target=192.168.0.100
451
452Potential Problems When Using ARP Monitor
453=========================================
454
4551. Driver support
456
457The ARP monitor relies on the network device driver to maintain two
458statistics: the last receive time (dev->last_rx), and the last
459transmit time (dev->trans_start).  If the network device driver does
460not update one or both of these, then the typical result will be that,
461upon startup, all links in the bond will immediately be declared down,
462and remain that way.  A network monitoring tool (tcpdump, e.g.) will
463show ARP requests and replies being sent and received on the bonding
464device.
465
466The possible resolutions for this are to (a) fix the device driver, or
467(b) discontinue the ARP monitor (using miimon as an alternative, for
468example).
469
4702. Adventures in Routing
471
472When bonding is set up with the ARP monitor, it is important that the
473slave devices not have routes that supercede routes of the master (or,
474generally, not have routes at all).  For example, suppose the bonding
475device bond0 has two slaves, eth0 and eth1, and the routing table is
476as follows:
477
478Kernel IP routing table
479Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
48010.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth0
48110.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 eth1
48210.0.0.0        0.0.0.0         255.255.0.0     U        40 0          0 bond0
483127.0.0.0       0.0.0.0         255.0.0.0       U        40 0          0 lo
484
485In this case, the ARP monitor (and ARP itself) may become confused,
486because ARP requests will be sent on one interface (bond0), but the
487corresponding reply will arrive on a different interface (eth0).  This
488reply looks to ARP as an unsolicited ARP reply (because ARP matches
489replies on an interface basis), and is discarded.  This will likely
490still update the receive/transmit times in the driver, but will lose
491packets.
492
493The resolution here is simply to insure that slaves do not have routes
494of their own, and if for some reason they must, those routes do not
495supercede routes of their master.  This should generally be the case,
496but unusual configurations or errant manual or automatic static route
497additions may cause trouble.
498
499Switch Configuration
500====================
501
502While the switch does not need to be configured when the active-backup,
503balance-tlb or balance-alb policies (mode=1,5,6) are used, it does need to
504be configured for the round-robin, XOR, broadcast, or 802.3ad policies
505(mode=0,2,3,4).
506
507
508Verifying Bond Configuration
509============================
510
5111) Bonding information files
512----------------------------
513The bonding driver information files reside in the /proc/net/bonding directory.
514
515Sample contents of /proc/net/bonding/bond0 after the driver is loaded with
516parameters of mode=0 and miimon=1000 is shown below.
517
518        Bonding Mode: load balancing (round-robin)
519        Currently Active Slave: eth0
520        MII Status: up
521        MII Polling Interval (ms): 1000
522        Up Delay (ms): 0
523        Down Delay (ms): 0
524
525        Slave Interface: eth1
526        MII Status: up
527        Link Failure Count: 1
528
529        Slave Interface: eth0
530        MII Status: up
531        Link Failure Count: 1
532
5332) Network verification
534-----------------------
535The network configuration can be verified using the ifconfig command. In
536the example below, the bond0 interface is the master (MASTER) while eth0 and
537eth1 are slaves (SLAVE). Notice all slaves of bond0 have the same MAC address
538(HWaddr) as bond0 for all modes except TLB and ALB that require a unique MAC
539address for each slave.
540
541[root]# /sbin/ifconfig
542bond0     Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
543          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
544          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
545          RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0
546          TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0
547          collisions:0 txqueuelen:0
548
549eth0      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
550          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
551          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
552          RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
553          TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
554          collisions:0 txqueuelen:100
555          Interrupt:10 Base address:0x1080
556
557eth1      Link encap:Ethernet  HWaddr 00:C0:F0:1F:37:B4
558          inet addr:XXX.XXX.XXX.YYY  Bcast:XXX.XXX.XXX.255  Mask:255.255.252.0
559          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
560          RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
561          TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
562          collisions:0 txqueuelen:100
563          Interrupt:9 Base address:0x1400
564
565
566Frequently Asked Questions
567==========================
568
5691.  Is it SMP safe?
570
571	Yes. The old 2.0.xx channel bonding patch was not SMP safe.
572	The new driver was designed to be SMP safe from the start.
573
5742.  What type of cards will work with it?
575
576	Any Ethernet type cards (you can even mix cards - a Intel
577	EtherExpress PRO/100 and a 3com 3c905b, for example).
578	You can even bond together Gigabit Ethernet cards!
579
5803.  How many bonding devices can I have?
581
582	There is no limit.
583
5844.  How many slaves can a bonding device have?
585
586	Limited by the number of network interfaces Linux supports and/or the
587	number of network cards you can place in your system.
588
5895.  What happens when a slave link dies?
590
591	If your ethernet cards support MII or ETHTOOL link status monitoring
592        and the MII monitoring has been enabled in the driver (see description
593        of module parameters), there will be no adverse consequences. This
594        release of the bonding driver knows how to get the MII information and
595	enables or disables its slaves according to their link status.
596	See section on High Availability for additional information.
597
598	For ethernet cards not supporting MII status, the arp_interval and
599        arp_ip_target parameters must be specified for bonding to work
600        correctly. If packets have not been sent or received during the
601        specified arp_interval duration, an ARP request is sent to the
602        targets to generate send and receive traffic. If after this
603        interval, either the successful send and/or receive count has not
604        incremented, the next slave in the sequence will become the active
605        slave.
606
607	If neither mii_monitor and arp_interval is configured, the bonding
608	driver will not handle this situation very well. The driver will
609	continue to send packets but some packets will be lost. Retransmits
610	will cause serious degradation of performance (in the case when one
611	of two slave links fails, 50% packets will be lost, which is a serious
612	problem for both TCP and UDP).
613
6146.  Can bonding be used for High Availability?
615
616        Yes, if you use MII monitoring and ALL your cards support MII link
617        status reporting. See section on High Availability for more
618        information.
619
6207.  Which switches/systems does it work with?
621
622	In round-robin and XOR mode, it works with systems that support
623	trunking:
624
625	* Many Cisco switches and routers (look for EtherChannel support).
626	* SunTrunking software.
627	* Alteon AceDirector switches / WebOS (use Trunks).
628	* BayStack Switches (trunks must be explicitly configured). Stackable
629	  models (450) can define trunks between ports on different physical
630	  units.
631	* Linux bonding, of course !
632
633	In 802.3ad mode, it works with with systems that support IEEE 802.3ad
634	Dynamic Link Aggregation:
635
636	* Extreme networks Summit 7i (look for link-aggregation).
637	* Many Cisco switches and routers (look for LACP support; this may
638	  require an upgrade to your IOS software; LACP support was added
639	  by Cisco in late 2002).
640	* Foundry Big Iron 4000
641
642        In active-backup, balance-tlb and balance-alb modes, it should work
643        with any Layer-II switch.
644
645
6468.  Where does a bonding device get its MAC address from?
647
648	If not explicitly configured with ifconfig, the MAC address of the
649	bonding device is taken from its first slave device. This MAC address
650	is then passed to all following slaves and remains persistent (even if
651	the the first slave is removed) until the bonding device is brought
652	down or reconfigured.
653
654	If you wish to change the MAC address, you can set it with ifconfig:
655
656	  # ifconfig bond0 hw ether 00:11:22:33:44:55
657
658	The MAC address can be also changed by bringing down/up the device
659	and then changing its slaves (or their order):
660
661	  # ifconfig bond0 down ; modprobe -r bonding
662	  # ifconfig bond0 .... up
663	  # ifenslave bond0 eth...
664
665	This method will automatically take the address from the next slave
666	that will be added.
667
668	To restore your slaves' MAC addresses, you need to detach them
669	from the bond (`ifenslave -d bond0 eth0'). The bonding driver will then
670	restore the MAC addresses that the slaves had before they were enslaved.
671
6729.  Which transmit polices can be used?
673
674	Round-robin, based on the order of enslaving, the output device
675	is selected base on the next available slave. Regardless of
676	the source and/or destination of the packet.
677
678	Active-backup policy that ensures that one and only one device will
679	transmit at any given moment. Active-backup policy is useful for
680	implementing high availability solutions using two hubs (see
681	section on High Availability).
682
683	XOR, based on (src hw addr XOR dst hw addr) % slave count. This
684	policy selects the same slave for each destination hw address.
685
686	Broadcast policy transmits everything on all slave interfaces.
687
688	802.3ad, based on XOR but distributes traffic among all interfaces
689	in the active aggregator.
690
691	Transmit load balancing (balance-tlb) balances the traffic
692	according to the current load on each slave. The balancing is
693	clients based and the least loaded slave is selected for each new
694	client. The load of each slave is calculated relative to its speed
695	and enables load balancing in mixed speed teams.
696
697	Adaptive load balancing (balance-alb) uses the Transmit load
698	balancing for the transmit load. The receive load is balanced only
699	among the group of highest speed active slaves in the bond. The
700	load is distributed with round-robin i.e. next available slave in
701	the high speed group of active slaves.
702
703High Availability
704=================
705
706To implement high availability using the bonding driver, the driver needs to be
707compiled as a module, because currently it is the only way to pass parameters
708to the driver. This may change in the future.
709
710High availability is achieved by using MII or ETHTOOL status reporting. You
711need to verify that all your interfaces support MII or ETHTOOL link status
712reporting.  On Linux kernel 2.2.17, all the 100 Mbps capable drivers and
713yellowfin gigabit driver support MII. To determine if ETHTOOL link reporting
714is available for interface eth0, type "ethtool eth0" and the "Link detected:"
715line should contain the correct link status. If your system has an interface
716that does not support MII or ETHTOOL status reporting, a failure of its link
717will not be detected! A message indicating MII and ETHTOOL is not supported by
718a network driver is logged when the bonding driver is loaded with a non-zero
719miimon value.
720
721The bonding driver can regularly check all its slaves links using the ETHTOOL
722IOCTL (ETHTOOL_GLINK command) or by checking the MII status registers. The
723check interval is specified by the module argument "miimon" (MII monitoring).
724It takes an integer that represents the checking time in milliseconds. It
725should not come to close to (1000/HZ) (10 milli-seconds on i386) because it
726may then reduce the system interactivity. A value of 100 seems to be a good
727starting point. It means that a dead link will be detected at most 100
728milli-seconds after it goes down.
729
730Example:
731
732   # modprobe bonding miimon=100
733
734Or, put the following lines in /etc/modules.conf:
735
736   alias bond0 bonding
737   options bond0 miimon=100
738
739There are currently two policies for high availability. They are dependent on
740whether:
741
742   a) hosts are connected to a single host or switch that support trunking
743
744   b) hosts are connected to several different switches or a single switch that
745      does not support trunking
746
747
7481) High Availability on a single switch or host - load balancing
749----------------------------------------------------------------
750It is the easiest to set up and to understand. Simply configure the
751remote equipment (host or switch) to aggregate traffic over several
752ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces.
753If the module has been loaded with the proper MII option, it will work
754automatically. You can then try to remove and restore different links
755and see in your logs what the driver detects. When testing, you may
756encounter problems on some buggy switches that disable the trunk for a
757long time if all ports in a trunk go down. This is not Linux, but really
758the switch (reboot it to ensure).
759
760Example 1 : host to host at twice the speed
761
762          +----------+                          +----------+
763          |          |eth0                  eth0|          |
764          | Host A   +--------------------------+  Host B  |
765          |          +--------------------------+          |
766          |          |eth1                  eth1|          |
767          +----------+                          +----------+
768
769  On each host :
770     # modprobe bonding miimon=100
771     # ifconfig bond0 addr
772     # ifenslave bond0 eth0 eth1
773
774Example 2 : host to switch at twice the speed
775
776          +----------+                          +----------+
777          |          |eth0                 port1|          |
778          | Host A   +--------------------------+  switch  |
779          |          +--------------------------+          |
780          |          |eth1                 port2|          |
781          +----------+                          +----------+
782
783  On host A :                             On the switch :
784     # modprobe bonding miimon=100           # set up a trunk on port1
785     # ifconfig bond0 addr                     and port2
786     # ifenslave bond0 eth0 eth1
787
788
7892) High Availability on two or more switches (or a single switch without
790   trunking support)
791---------------------------------------------------------------------------
792This mode is more problematic because it relies on the fact that there
793are multiple ports and the host's MAC address should be visible on one
794port only to avoid confusing the switches.
795
796If you need to know which interface is the active one, and which ones are
797backup, use ifconfig. All backup interfaces have the NOARP flag set.
798
799To use this mode, pass "mode=1" to the module at load time :
800
801    # modprobe bonding miimon=100 mode=active-backup
802
803	or:
804
805    # modprobe bonding miimon=100 mode=1
806
807Or, put in your /etc/modules.conf :
808
809    alias bond0 bonding
810    options bond0 miimon=100 mode=active-backup
811
812Example 1: Using multiple host and multiple switches to build a "no single
813point of failure" solution.
814
815
816                |                                     |
817                |port3                           port3|
818          +-----+----+                          +-----+----+
819          |          |port7       ISL      port7|          |
820          | switch A +--------------------------+ switch B |
821          |          +--------------------------+          |
822          |          |port8                port8|          |
823          +----++----+                          +-----++---+
824          port2||port1                           port1||port2
825               ||             +-------+               ||
826               |+-------------+ host1 +---------------+|
827               |         eth0 +-------+ eth1           |
828               |                                       |
829               |              +-------+                |
830               +--------------+ host2 +----------------+
831                         eth0 +-------+ eth1
832
833In this configuration, there is an ISL - Inter Switch Link (could be a trunk),
834several servers (host1, host2 ...) attached to both switches each, and one or
835more ports to the outside world (port3...). One and only one slave on each host
836is active at a time, while all links are still monitored (the system can
837detect a failure of active and backup links).
838
839Each time a host changes its active interface, it sticks to the new one until
840it goes down. In this example, the hosts are negligibly affected by the
841expiration time of the switches' forwarding tables.
842
843If host1 and host2 have the same functionality and are used in load balancing
844by another external mechanism, it is good to have host1's active interface
845connected to one switch and host2's to the other. Such system will survive
846a failure of a single host, cable, or switch. The worst thing that may happen
847in the case of a switch failure is that half of the hosts will be temporarily
848unreachable until the other switch expires its tables.
849
850Example 2: Using multiple ethernet cards connected to a switch to configure
851           NIC failover (switch is not required to support trunking).
852
853
854          +----------+                          +----------+
855          |          |eth0                 port1|          |
856          | Host A   +--------------------------+  switch  |
857          |          +--------------------------+          |
858          |          |eth1                 port2|          |
859          +----------+                          +----------+
860
861  On host A :                                 On the switch :
862     # modprobe bonding miimon=100 mode=1     # (optional) minimize the time
863     # ifconfig bond0 addr                    # for table expiration
864     # ifenslave bond0 eth0 eth1
865
866Each time the host changes its active interface, it sticks to the new one until
867it goes down. In this example, the host is strongly affected by the expiration
868time of the switch forwarding table.
869
870
8713) Adapting to your switches' timing
872------------------------------------
873If your switches take a long time to go into backup mode, it may be
874desirable not to activate a backup interface immediately after a link goes
875down. It is possible to delay the moment at which a link will be
876completely disabled by passing the module parameter "downdelay" (in
877milliseconds, must be a multiple of miimon).
878
879When a switch reboots, it is possible that its ports report "link up" status
880before they become usable. This could fool a bond device by causing it to
881use some ports that are not ready yet. It is possible to delay the moment at
882which an active link will be reused by passing the module parameter "updelay"
883(in milliseconds, must be a multiple of miimon).
884
885A similar situation can occur when a host re-negotiates a lost link with the
886switch (a case of cable replacement).
887
888A special case is when a bonding interface has lost all slave links. Then the
889driver will immediately reuse the first link that goes up, even if updelay
890parameter was specified. (If there are slave interfaces in the "updelay" state,
891the interface that first went into that state will be immediately reused.) This
892allows to reduce down-time if the value of updelay has been overestimated.
893
894Examples :
895
896    # modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000
897    # modprobe bonding miimon=100 mode=balance-rr downdelay=0 updelay=5000
898
899
900Promiscuous Sniffing notes
901==========================
902
903If you wish to bond channels together for a network sniffing
904application --- you wish to run tcpdump, or ethereal, or an IDS like
905snort, with its input aggregated from multiple interfaces using the
906bonding driver --- then you need to handle the Promiscuous interface
907setting by hand. Specifically, when you "ifconfing bond0 up" you
908must add the promisc flag there; it will be propagated down to the
909slave interfaces at ifenslave time; a full example might look like:
910
911   grep bond0 /etc/modules.conf || echo alias bond0 bonding >/etc/modules.conf
912   ifconfig bond0 promisc up
913   for if in eth1 eth2 ...;do
914       ifconfig $if up
915       ifenslave bond0 $if
916   done
917   snort ... -i bond0 ...
918
919Ifenslave also wants to propagate addresses from interface to
920interface, appropriately for its design functions in HA and channel
921capacity aggregating; but it works fine for unnumbered interfaces;
922just ignore all the warnings it emits.
923
924
9258021q VLAN support
926==================
927
928It is possible to configure VLAN devices over a bond interface using the 8021q
929driver. However, only packets coming from the 8021q driver and passing through
930bonding will be tagged by default. Self generated packets, like bonding's
931learning packets or ARP packets generated by either ALB mode or the ARP
932monitor mechanism, are tagged internally by bonding itself. As a result,
933bonding has to "learn" what VLAN IDs are configured on top of it, and it uses
934those IDs to tag self generated packets.
935
936For simplicity reasons, and to support the use of adapters that can do VLAN
937hardware acceleration offloding, the bonding interface declares itself as
938fully hardware offloaing capable, it gets the add_vid/kill_vid notifications
939to gather the necessary information, and it propagates those actions to the
940slaves.
941In case of mixed adapter types, hardware accelerated tagged packets that should
942go through an adapter that is not offloading capable are "un-accelerated" by the
943bonding driver so the VLAN tag sits in the regular location.
944
945VLAN interfaces *must* be added on top of a bonding interface only after
946enslaving at least one slave. This is because until the first slave is added the
947bonding interface has a HW address of 00:00:00:00:00:00, which will be copied by
948the VLAN interface when it is created.
949
950Notice that a problem would occur if all slaves are released from a bond that
951still has VLAN interfaces on top of it. When later coming to add new slaves, the
952bonding interface would get a HW address from the first slave, which might not
953match that of the VLAN interfaces. It is recommended that either all VLANs are
954removed and then re-added, or to manually set the bonding interface's HW
955address so it matches the VLAN's. (Note: changing a VLAN interface's HW address
956would set the underlying device -- i.e. the bonding interface -- to promiscouos
957mode, which might not be what you want).
958
959
960Limitations
961===========
962The main limitations are :
963  - only the link status is monitored. If the switch on the other side is
964    partially down (e.g. doesn't forward anymore, but the link is OK), the link
965    won't be disabled. Another way to check for a dead link could be to count
966    incoming frames on a heavily loaded host. This is not applicable to small
967    servers, but may be useful when the front switches send multicast
968    information on their links (e.g. VRRP), or even health-check the servers.
969    Use the arp_interval/arp_ip_target parameters to count incoming/outgoing
970    frames.
971
972
973
974Resources and Links
975===================
976
977Current development on this driver is posted to:
978 - http://www.sourceforge.net/projects/bonding/
979
980Donald Becker's Ethernet Drivers and diag programs may be found at :
981 - http://www.scyld.com/network/
982
983You will also find a lot of information regarding Ethernet, NWay, MII, etc. at
984www.scyld.com.
985
986Patches for 2.2 kernels are at Willy Tarreau's site :
987 - http://wtarreau.free.fr/pub/bonding/
988 - http://www-miaif.lip6.fr/~tarreau/pub/bonding/
989
990To get latest informations about Linux Kernel development, please consult
991the Linux Kernel Mailing List Archives at :
992   http://www.ussg.iu.edu/hypermail/linux/kernel/
993
994-- END --
995