1 2 Linux Ethernet Bonding Driver mini-howto 3 4Initial release : Thomas Davis <tadavis at lbl.gov> 5Corrections, HA extensions : 2000/10/03-15 : 6 - Willy Tarreau <willy at meta-x.org> 7 - Constantine Gavrilov <const-g at xpert.com> 8 - Chad N. Tindel <ctindel at ieee dot org> 9 - Janice Girouard <girouard at us dot ibm dot com> 10 - Jay Vosburgh <fubar at us dot ibm dot com> 11 12Note : 13------ 14The bonding driver originally came from Donald Becker's beowulf patches for 15kernel 2.0. It has changed quite a bit since, and the original tools from 16extreme-linux and beowulf sites will not work with this version of the driver. 17 18For new versions of the driver, patches for older kernels and the updated 19userspace tools, please follow the links at the end of this file. 20 21 22Table of Contents 23================= 24 25Installation 26Bond Configuration 27Module Parameters 28Configuring Multiple Bonds 29Switch Configuration 30Verifying Bond Configuration 31Frequently Asked Questions 32High Availability 33Promiscuous Sniffing notes 348021q VLAN support 35Limitations 36Resources and Links 37 38 39Installation 40============ 41 421) Build kernel with the bonding driver 43--------------------------------------- 44For the latest version of the bonding driver, use kernel 2.4.12 or above 45(otherwise you will need to apply a patch). 46 47Configure kernel with `make menuconfig/xconfig/config', and select "Bonding 48driver support" in the "Network device support" section. It is recommended 49to configure the driver as module since it is currently the only way to 50pass parameters to the driver and configure more than one bonding device. 51 52Build and install the new kernel and modules. 53 542) Get and install the userspace tools 55-------------------------------------- 56This version of the bonding driver requires updated ifenslave program. The 57original one from extreme-linux and beowulf will not work. Kernels 2.4.12 58and above include the updated version of ifenslave.c in Documentation/network 59directory. For older kernels, please follow the links at the end of this file. 60 61IMPORTANT!!! If you are running on Redhat 7.1 or greater, you need 62to be careful because /usr/include/linux is no longer a symbolic link 63to /usr/src/linux/include/linux. If you build ifenslave while this is 64true, ifenslave will appear to succeed but your bond won't work. The purpose 65of the -I option on the ifenslave compile line is to make sure it uses 66/usr/src/linux/include/linux/if_bonding.h instead of the version from 67/usr/include/linux. 68 69To install ifenslave.c, do: 70 # gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave 71 # cp ifenslave /sbin/ifenslave 72 73 74Bond Configuration 75================== 76 77You will need to add at least the following line to /etc/modules.conf 78so the bonding driver will automatically load when the bond0 interface is 79configured. Refer to the modules.conf manual page for specific modules.conf 80syntax details. The Module Parameters section of this document describes each 81bonding driver parameter. 82 83 alias bond0 bonding 84 85Use standard distribution techniques to define the bond0 network interface. For 86example, on modern Red Hat distributions, create an ifcfg-bond0 file in 87the /etc/sysconfig/network-scripts directory that resembles the following: 88 89DEVICE=bond0 90IPADDR=192.168.1.1 91NETMASK=255.255.255.0 92NETWORK=192.168.1.0 93BROADCAST=192.168.1.255 94ONBOOT=yes 95BOOTPROTO=none 96USERCTL=no 97 98(use appropriate values for your network above) 99 100All interfaces that are part of a bond should have SLAVE and MASTER 101definitions. For example, in the case of Red Hat, if you wish to make eth0 and 102eth1 a part of the bonding interface bond0, their config files (ifcfg-eth0 and 103ifcfg-eth1) should resemble the following: 104 105DEVICE=eth0 106USERCTL=no 107ONBOOT=yes 108MASTER=bond0 109SLAVE=yes 110BOOTPROTO=none 111 112Use DEVICE=eth1 in the ifcfg-eth1 config file. If you configure a second 113bonding interface (bond1), use MASTER=bond1 in the config file to make the 114network interface be a slave of bond1. 115 116Restart the networking subsystem or just bring up the bonding device if your 117administration tools allow it. Otherwise, reboot. On Red Hat distros you can 118issue `ifup bond0' or `/etc/rc.d/init.d/network restart'. 119 120If the administration tools of your distribution do not support 121master/slave notation in configuring network interfaces, you will need to 122manually configure the bonding device with the following commands: 123 124 # /sbin/ifconfig bond0 192.168.1.1 netmask 255.255.255.0 \ 125 broadcast 192.168.1.255 up 126 127 # /sbin/ifenslave bond0 eth0 128 # /sbin/ifenslave bond0 eth1 129 130(use appropriate values for your network above) 131 132You can then create a script containing these commands and place it in the 133appropriate rc directory. 134 135If you specifically need all network drivers loaded before the bonding driver, 136adding the following line to modules.conf will cause the network driver for 137eth0 and eth1 to be loaded before the bonding driver. 138 139probeall bond0 eth0 eth1 bonding 140 141Be careful not to reference bond0 itself at the end of the line, or modprobe 142will die in an endless recursive loop. 143 144If running SNMP agents, the bonding driver should be loaded before any network 145drivers participating in a bond. This requirement is due to the the interface 146index (ipAdEntIfIndex) being associated to the first interface found with a 147given IP address. That is, there is only one ipAdEntIfIndex for each IP 148address. For example, if eth0 and eth1 are slaves of bond0 and the driver for 149eth0 is loaded before the bonding driver, the interface for the IP address 150will be associated with the eth0 interface. This configuration is shown below, 151the IP address 192.168.1.1 has an interface index of 2 which indexes to eth0 152in the ifDescr table (ifDescr.2). 153 154 interfaces.ifTable.ifEntry.ifDescr.1 = lo 155 interfaces.ifTable.ifEntry.ifDescr.2 = eth0 156 interfaces.ifTable.ifEntry.ifDescr.3 = eth1 157 interfaces.ifTable.ifEntry.ifDescr.4 = eth2 158 interfaces.ifTable.ifEntry.ifDescr.5 = eth3 159 interfaces.ifTable.ifEntry.ifDescr.6 = bond0 160 ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 5 161 ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 162 ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 4 163 ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 164 165This problem is avoided by loading the bonding driver before any network 166drivers participating in a bond. Below is an example of loading the bonding 167driver first, the IP address 192.168.1.1 is correctly associated with 168ifDescr.2. 169 170 interfaces.ifTable.ifEntry.ifDescr.1 = lo 171 interfaces.ifTable.ifEntry.ifDescr.2 = bond0 172 interfaces.ifTable.ifEntry.ifDescr.3 = eth0 173 interfaces.ifTable.ifEntry.ifDescr.4 = eth1 174 interfaces.ifTable.ifEntry.ifDescr.5 = eth2 175 interfaces.ifTable.ifEntry.ifDescr.6 = eth3 176 ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.10.10.10 = 6 177 ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.192.168.1.1 = 2 178 ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.10.74.20.94 = 5 179 ip.ipAddrTable.ipAddrEntry.ipAdEntIfIndex.127.0.0.1 = 1 180 181While some distributions may not report the interface name in ifDescr, 182the association between the IP address and IfIndex remains and SNMP 183functions such as Interface_Scan_Next will report that association. 184 185 186Module Parameters 187================= 188 189Optional parameters for the bonding driver can be supplied as command line 190arguments to the insmod command. Typically, these parameters are specified in 191the file /etc/modules.conf (see the manual page for modules.conf). The 192available bonding driver parameters are listed below. If a parameter is not 193specified the default value is used. When initially configuring a bond, it 194is recommended "tail -f /var/log/messages" be run in a separate window to 195watch for bonding driver error messages. 196 197It is critical that either the miimon or arp_interval and arp_ip_target 198parameters be specified, otherwise serious network degradation will occur 199during link failures. 200 201arp_interval 202 203 Specifies the ARP monitoring frequency in milli-seconds. 204 If ARP monitoring is used in a load-balancing mode (mode 0 or 2), the 205 switch should be configured in a mode that evenly distributes packets 206 across all links - such as round-robin. If the switch is configured to 207 distribute the packets in an XOR fashion, all replies from the ARP 208 targets will be received on the same link which could cause the other 209 team members to fail. ARP monitoring should not be used in conjunction 210 with miimon. A value of 0 disables ARP monitoring. The default value 211 is 0. 212 213arp_ip_target 214 215 Specifies the ip addresses to use when arp_interval is > 0. These 216 are the targets of the ARP request sent to determine the health of 217 the link to the targets. Specify these values in ddd.ddd.ddd.ddd 218 format. Multiple ip adresses must be seperated by a comma. At least 219 one ip address needs to be given for ARP monitoring to work. The 220 maximum number of targets that can be specified is set at 16. 221 222downdelay 223 224 Specifies the delay time in milli-seconds to disable a link after a 225 link failure has been detected. This should be a multiple of miimon 226 value, otherwise the value will be rounded. The default value is 0. 227 228lacp_rate 229 230 Option specifying the rate in which we'll ask our link partner to 231 transmit LACPDU packets in 802.3ad mode. Possible values are: 232 233 slow or 0 234 Request partner to transmit LACPDUs every 30 seconds (default) 235 236 fast or 1 237 Request partner to transmit LACPDUs every 1 second 238 239max_bonds 240 241 Specifies the number of bonding devices to create for this 242 instance of the bonding driver. E.g., if max_bonds is 3, and 243 the bonding driver is not already loaded, then bond0, bond1 244 and bond2 will be created. The default value is 1. 245 246miimon 247 248 Specifies the frequency in milli-seconds that MII link monitoring 249 will occur. A value of zero disables MII link monitoring. A value 250 of 100 is a good starting point. See High Availability section for 251 additional information. The default value is 0. 252 253mode 254 255 Specifies one of the bonding policies. The default is 256 round-robin (balance-rr). Possible values are (you can use 257 either the text or numeric option): 258 259 balance-rr or 0 260 261 Round-robin policy: Transmit in a sequential order 262 from the first available slave through the last. This 263 mode provides load balancing and fault tolerance. 264 265 active-backup or 1 266 267 Active-backup policy: Only one slave in the bond is 268 active. A different slave becomes active if, and only 269 if, the active slave fails. The bond's MAC address is 270 externally visible on only one port (network adapter) 271 to avoid confusing the switch. This mode provides 272 fault tolerance. 273 274 balance-xor or 2 275 276 XOR policy: Transmit based on [(source MAC address 277 XOR'd with destination MAC address) modula slave 278 count]. This selects the same slave for each 279 destination MAC address. This mode provides load 280 balancing and fault tolerance. 281 282 broadcast or 3 283 284 Broadcast policy: transmits everything on all slave 285 interfaces. This mode provides fault tolerance. 286 287 802.3ad or 4 288 289 IEEE 802.3ad Dynamic link aggregation. Creates aggregation 290 groups that share the same speed and duplex settings. 291 Transmits and receives on all slaves in the active 292 aggregator. 293 294 Pre-requisites: 295 296 1. Ethtool support in the base drivers for retrieving the 297 speed and duplex of each slave. 298 299 2. A switch that supports IEEE 802.3ad Dynamic link 300 aggregation. 301 302 balance-tlb or 5 303 304 Adaptive transmit load balancing: channel bonding that does 305 not require any special switch support. The outgoing 306 traffic is distributed according to the current load 307 (computed relative to the speed) on each slave. Incoming 308 traffic is received by the current slave. If the receiving 309 slave fails, another slave takes over the MAC address of 310 the failed receiving slave. 311 312 Prerequisite: 313 314 Ethtool support in the base drivers for retrieving the 315 speed of each slave. 316 317 balance-alb or 6 318 319 Adaptive load balancing: includes balance-tlb + receive 320 load balancing (rlb) for IPV4 traffic and does not require 321 any special switch support. The receive load balancing is 322 achieved by ARP negotiation. The bonding driver intercepts 323 the ARP Replies sent by the server on their way out and 324 overwrites the src hw address with the unique hw address of 325 one of the slaves in the bond such that different clients 326 use different hw addresses for the server. 327 328 Receive traffic from connections created by the server is 329 also balanced. When the server sends an ARP Request the 330 bonding driver copies and saves the client's IP information 331 from the ARP. When the ARP Reply arrives from the client, 332 its hw address is retrieved and the bonding driver 333 initiates an ARP reply to this client assigning it to one 334 of the slaves in the bond. A problematic outcome of using 335 ARP negotiation for balancing is that each time that an ARP 336 request is broadcasted it uses the hw address of the 337 bond. Hence, clients learn the hw address of the bond and 338 the balancing of receive traffic collapses to the current 339 salve. This is handled by sending updates (ARP Replies) to 340 all the clients with their assigned hw address such that 341 the traffic is redistributed. Receive traffic is also 342 redistributed when a new slave is added to the bond and 343 when an inactive slave is re-activated. The receive load is 344 distributed sequentially (round robin) among the group of 345 highest speed slaves in the bond. 346 347 When a link is reconnected or a new slave joins the bond 348 the receive traffic is redistributed among all active 349 slaves in the bond by intiating ARP Replies with the 350 selected mac address to each of the clients. The updelay 351 modeprobe parameter must be set to a value equal or greater 352 than the switch's forwarding delay so that the ARP Replies 353 sent to the clients will not be blocked by the switch. 354 355 Prerequisites: 356 357 1. Ethtool support in the base drivers for retrieving the 358 speed of each slave. 359 360 2. Base driver support for setting the hw address of a 361 device also when it is open. This is required so that there 362 will always be one slave in the team using the bond hw 363 address (the curr_active_slave) while having a unique hw 364 address for each slave in the bond. If the curr_active_slave 365 fails it's hw address is swapped with the new curr_active_slave 366 that was chosen. 367 368primary 369 370 A string (eth0, eth2, etc) to equate to a primary device. If this 371 value is entered, and the device is on-line, it will be used first 372 as the output media. Only when this device is off-line, will 373 alternate devices be used. Otherwise, once a failover is detected 374 and a new default output is chosen, it will remain the output media 375 until it too fails. This is useful when one slave was preferred 376 over another, i.e. when one slave is 1000Mbps and another is 377 100Mbps. If the 1000Mbps slave fails and is later restored, it may 378 be preferred the faster slave gracefully become the active slave - 379 without deliberately failing the 100Mbps slave. Specifying a 380 primary is only valid in active-backup mode. 381 382updelay 383 384 Specifies the delay time in milli-seconds to enable a link after a 385 link up status has been detected. This should be a multiple of miimon 386 value, otherwise the value will be rounded. The default value is 0. 387 388use_carrier 389 390 Specifies whether or not miimon should use MII or ETHTOOL 391 ioctls vs. netif_carrier_ok() to determine the link status. 392 The MII or ETHTOOL ioctls are less efficient and utilize a 393 deprecated calling sequence within the kernel. The 394 netif_carrier_ok() relies on the device driver to maintain its 395 state with netif_carrier_on/off; at this writing, most, but 396 not all, device drivers support this facility. 397 398 If bonding insists that the link is up when it should not be, 399 it may be that your network device driver does not support 400 netif_carrier_on/off. This is because the default state for 401 netif_carrier is "carrier on." In this case, disabling 402 use_carrier will cause bonding to revert to the MII / ETHTOOL 403 ioctl method to determine the link state. 404 405 A value of 1 enables the use of netif_carrier_ok(), a value of 406 0 will use the deprecated MII / ETHTOOL ioctls. The default 407 value is 1. 408 409 410Configuring Multiple Bonds 411========================== 412 413If several bonding interfaces are required, either specify the max_bonds 414parameter (described above), or load the driver multiple times. Using 415the max_bonds parameter is less complicated, but has the limitation that 416all bonding instances created will have the same options. Loading the 417driver multiple times allows each instance of the driver to have differing 418options. 419 420For example, to configure two bonding interfaces, one with mii link 421monitoring performed every 100 milliseconds, and one with ARP link 422monitoring performed every 200 milliseconds, the /etc/conf.modules should 423resemble the following: 424 425alias bond0 bonding 426alias bond1 bonding 427 428options bond0 miimon=100 429options bond1 -o bonding1 arp_interval=200 arp_ip_target=10.0.0.1 430 431Configuring Multiple ARP Targets 432================================ 433 434While ARP monitoring can be done with just one target, it can be useful 435in a High Availability setup to have several targets to monitor. In the 436case of just one target, the target itself may go down or have a problem 437making it unresponsive to ARP requests. Having an additional target (or 438several) increases the reliability of the ARP monitoring. 439 440Multiple ARP targets must be seperated by commas as follows: 441 442# example options for ARP monitoring with three targets 443alias bond0 bonding 444options bond0 arp_interval=60 arp_ip_target=192.168.0.1,192.168.0.3,192.168.0.9 445 446For just a single target the options would resemble: 447 448# example options for ARP monitoring with one target 449alias bond0 bonding 450options bond0 arp_interval=60 arp_ip_target=192.168.0.100 451 452Potential Problems When Using ARP Monitor 453========================================= 454 4551. Driver support 456 457The ARP monitor relies on the network device driver to maintain two 458statistics: the last receive time (dev->last_rx), and the last 459transmit time (dev->trans_start). If the network device driver does 460not update one or both of these, then the typical result will be that, 461upon startup, all links in the bond will immediately be declared down, 462and remain that way. A network monitoring tool (tcpdump, e.g.) will 463show ARP requests and replies being sent and received on the bonding 464device. 465 466The possible resolutions for this are to (a) fix the device driver, or 467(b) discontinue the ARP monitor (using miimon as an alternative, for 468example). 469 4702. Adventures in Routing 471 472When bonding is set up with the ARP monitor, it is important that the 473slave devices not have routes that supercede routes of the master (or, 474generally, not have routes at all). For example, suppose the bonding 475device bond0 has two slaves, eth0 and eth1, and the routing table is 476as follows: 477 478Kernel IP routing table 479Destination Gateway Genmask Flags MSS Window irtt Iface 48010.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth0 48110.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 eth1 48210.0.0.0 0.0.0.0 255.255.0.0 U 40 0 0 bond0 483127.0.0.0 0.0.0.0 255.0.0.0 U 40 0 0 lo 484 485In this case, the ARP monitor (and ARP itself) may become confused, 486because ARP requests will be sent on one interface (bond0), but the 487corresponding reply will arrive on a different interface (eth0). This 488reply looks to ARP as an unsolicited ARP reply (because ARP matches 489replies on an interface basis), and is discarded. This will likely 490still update the receive/transmit times in the driver, but will lose 491packets. 492 493The resolution here is simply to insure that slaves do not have routes 494of their own, and if for some reason they must, those routes do not 495supercede routes of their master. This should generally be the case, 496but unusual configurations or errant manual or automatic static route 497additions may cause trouble. 498 499Switch Configuration 500==================== 501 502While the switch does not need to be configured when the active-backup, 503balance-tlb or balance-alb policies (mode=1,5,6) are used, it does need to 504be configured for the round-robin, XOR, broadcast, or 802.3ad policies 505(mode=0,2,3,4). 506 507 508Verifying Bond Configuration 509============================ 510 5111) Bonding information files 512---------------------------- 513The bonding driver information files reside in the /proc/net/bonding directory. 514 515Sample contents of /proc/net/bonding/bond0 after the driver is loaded with 516parameters of mode=0 and miimon=1000 is shown below. 517 518 Bonding Mode: load balancing (round-robin) 519 Currently Active Slave: eth0 520 MII Status: up 521 MII Polling Interval (ms): 1000 522 Up Delay (ms): 0 523 Down Delay (ms): 0 524 525 Slave Interface: eth1 526 MII Status: up 527 Link Failure Count: 1 528 529 Slave Interface: eth0 530 MII Status: up 531 Link Failure Count: 1 532 5332) Network verification 534----------------------- 535The network configuration can be verified using the ifconfig command. In 536the example below, the bond0 interface is the master (MASTER) while eth0 and 537eth1 are slaves (SLAVE). Notice all slaves of bond0 have the same MAC address 538(HWaddr) as bond0 for all modes except TLB and ALB that require a unique MAC 539address for each slave. 540 541[root]# /sbin/ifconfig 542bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 543 inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 544 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 545 RX packets:7224794 errors:0 dropped:0 overruns:0 frame:0 546 TX packets:3286647 errors:1 dropped:0 overruns:1 carrier:0 547 collisions:0 txqueuelen:0 548 549eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 550 inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 551 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 552 RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 553 TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 554 collisions:0 txqueuelen:100 555 Interrupt:10 Base address:0x1080 556 557eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 558 inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 559 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 560 RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 561 TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 562 collisions:0 txqueuelen:100 563 Interrupt:9 Base address:0x1400 564 565 566Frequently Asked Questions 567========================== 568 5691. Is it SMP safe? 570 571 Yes. The old 2.0.xx channel bonding patch was not SMP safe. 572 The new driver was designed to be SMP safe from the start. 573 5742. What type of cards will work with it? 575 576 Any Ethernet type cards (you can even mix cards - a Intel 577 EtherExpress PRO/100 and a 3com 3c905b, for example). 578 You can even bond together Gigabit Ethernet cards! 579 5803. How many bonding devices can I have? 581 582 There is no limit. 583 5844. How many slaves can a bonding device have? 585 586 Limited by the number of network interfaces Linux supports and/or the 587 number of network cards you can place in your system. 588 5895. What happens when a slave link dies? 590 591 If your ethernet cards support MII or ETHTOOL link status monitoring 592 and the MII monitoring has been enabled in the driver (see description 593 of module parameters), there will be no adverse consequences. This 594 release of the bonding driver knows how to get the MII information and 595 enables or disables its slaves according to their link status. 596 See section on High Availability for additional information. 597 598 For ethernet cards not supporting MII status, the arp_interval and 599 arp_ip_target parameters must be specified for bonding to work 600 correctly. If packets have not been sent or received during the 601 specified arp_interval duration, an ARP request is sent to the 602 targets to generate send and receive traffic. If after this 603 interval, either the successful send and/or receive count has not 604 incremented, the next slave in the sequence will become the active 605 slave. 606 607 If neither mii_monitor and arp_interval is configured, the bonding 608 driver will not handle this situation very well. The driver will 609 continue to send packets but some packets will be lost. Retransmits 610 will cause serious degradation of performance (in the case when one 611 of two slave links fails, 50% packets will be lost, which is a serious 612 problem for both TCP and UDP). 613 6146. Can bonding be used for High Availability? 615 616 Yes, if you use MII monitoring and ALL your cards support MII link 617 status reporting. See section on High Availability for more 618 information. 619 6207. Which switches/systems does it work with? 621 622 In round-robin and XOR mode, it works with systems that support 623 trunking: 624 625 * Many Cisco switches and routers (look for EtherChannel support). 626 * SunTrunking software. 627 * Alteon AceDirector switches / WebOS (use Trunks). 628 * BayStack Switches (trunks must be explicitly configured). Stackable 629 models (450) can define trunks between ports on different physical 630 units. 631 * Linux bonding, of course ! 632 633 In 802.3ad mode, it works with with systems that support IEEE 802.3ad 634 Dynamic Link Aggregation: 635 636 * Extreme networks Summit 7i (look for link-aggregation). 637 * Many Cisco switches and routers (look for LACP support; this may 638 require an upgrade to your IOS software; LACP support was added 639 by Cisco in late 2002). 640 * Foundry Big Iron 4000 641 642 In active-backup, balance-tlb and balance-alb modes, it should work 643 with any Layer-II switch. 644 645 6468. Where does a bonding device get its MAC address from? 647 648 If not explicitly configured with ifconfig, the MAC address of the 649 bonding device is taken from its first slave device. This MAC address 650 is then passed to all following slaves and remains persistent (even if 651 the the first slave is removed) until the bonding device is brought 652 down or reconfigured. 653 654 If you wish to change the MAC address, you can set it with ifconfig: 655 656 # ifconfig bond0 hw ether 00:11:22:33:44:55 657 658 The MAC address can be also changed by bringing down/up the device 659 and then changing its slaves (or their order): 660 661 # ifconfig bond0 down ; modprobe -r bonding 662 # ifconfig bond0 .... up 663 # ifenslave bond0 eth... 664 665 This method will automatically take the address from the next slave 666 that will be added. 667 668 To restore your slaves' MAC addresses, you need to detach them 669 from the bond (`ifenslave -d bond0 eth0'). The bonding driver will then 670 restore the MAC addresses that the slaves had before they were enslaved. 671 6729. Which transmit polices can be used? 673 674 Round-robin, based on the order of enslaving, the output device 675 is selected base on the next available slave. Regardless of 676 the source and/or destination of the packet. 677 678 Active-backup policy that ensures that one and only one device will 679 transmit at any given moment. Active-backup policy is useful for 680 implementing high availability solutions using two hubs (see 681 section on High Availability). 682 683 XOR, based on (src hw addr XOR dst hw addr) % slave count. This 684 policy selects the same slave for each destination hw address. 685 686 Broadcast policy transmits everything on all slave interfaces. 687 688 802.3ad, based on XOR but distributes traffic among all interfaces 689 in the active aggregator. 690 691 Transmit load balancing (balance-tlb) balances the traffic 692 according to the current load on each slave. The balancing is 693 clients based and the least loaded slave is selected for each new 694 client. The load of each slave is calculated relative to its speed 695 and enables load balancing in mixed speed teams. 696 697 Adaptive load balancing (balance-alb) uses the Transmit load 698 balancing for the transmit load. The receive load is balanced only 699 among the group of highest speed active slaves in the bond. The 700 load is distributed with round-robin i.e. next available slave in 701 the high speed group of active slaves. 702 703High Availability 704================= 705 706To implement high availability using the bonding driver, the driver needs to be 707compiled as a module, because currently it is the only way to pass parameters 708to the driver. This may change in the future. 709 710High availability is achieved by using MII or ETHTOOL status reporting. You 711need to verify that all your interfaces support MII or ETHTOOL link status 712reporting. On Linux kernel 2.2.17, all the 100 Mbps capable drivers and 713yellowfin gigabit driver support MII. To determine if ETHTOOL link reporting 714is available for interface eth0, type "ethtool eth0" and the "Link detected:" 715line should contain the correct link status. If your system has an interface 716that does not support MII or ETHTOOL status reporting, a failure of its link 717will not be detected! A message indicating MII and ETHTOOL is not supported by 718a network driver is logged when the bonding driver is loaded with a non-zero 719miimon value. 720 721The bonding driver can regularly check all its slaves links using the ETHTOOL 722IOCTL (ETHTOOL_GLINK command) or by checking the MII status registers. The 723check interval is specified by the module argument "miimon" (MII monitoring). 724It takes an integer that represents the checking time in milliseconds. It 725should not come to close to (1000/HZ) (10 milli-seconds on i386) because it 726may then reduce the system interactivity. A value of 100 seems to be a good 727starting point. It means that a dead link will be detected at most 100 728milli-seconds after it goes down. 729 730Example: 731 732 # modprobe bonding miimon=100 733 734Or, put the following lines in /etc/modules.conf: 735 736 alias bond0 bonding 737 options bond0 miimon=100 738 739There are currently two policies for high availability. They are dependent on 740whether: 741 742 a) hosts are connected to a single host or switch that support trunking 743 744 b) hosts are connected to several different switches or a single switch that 745 does not support trunking 746 747 7481) High Availability on a single switch or host - load balancing 749---------------------------------------------------------------- 750It is the easiest to set up and to understand. Simply configure the 751remote equipment (host or switch) to aggregate traffic over several 752ports (Trunk, EtherChannel, etc.) and configure the bonding interfaces. 753If the module has been loaded with the proper MII option, it will work 754automatically. You can then try to remove and restore different links 755and see in your logs what the driver detects. When testing, you may 756encounter problems on some buggy switches that disable the trunk for a 757long time if all ports in a trunk go down. This is not Linux, but really 758the switch (reboot it to ensure). 759 760Example 1 : host to host at twice the speed 761 762 +----------+ +----------+ 763 | |eth0 eth0| | 764 | Host A +--------------------------+ Host B | 765 | +--------------------------+ | 766 | |eth1 eth1| | 767 +----------+ +----------+ 768 769 On each host : 770 # modprobe bonding miimon=100 771 # ifconfig bond0 addr 772 # ifenslave bond0 eth0 eth1 773 774Example 2 : host to switch at twice the speed 775 776 +----------+ +----------+ 777 | |eth0 port1| | 778 | Host A +--------------------------+ switch | 779 | +--------------------------+ | 780 | |eth1 port2| | 781 +----------+ +----------+ 782 783 On host A : On the switch : 784 # modprobe bonding miimon=100 # set up a trunk on port1 785 # ifconfig bond0 addr and port2 786 # ifenslave bond0 eth0 eth1 787 788 7892) High Availability on two or more switches (or a single switch without 790 trunking support) 791--------------------------------------------------------------------------- 792This mode is more problematic because it relies on the fact that there 793are multiple ports and the host's MAC address should be visible on one 794port only to avoid confusing the switches. 795 796If you need to know which interface is the active one, and which ones are 797backup, use ifconfig. All backup interfaces have the NOARP flag set. 798 799To use this mode, pass "mode=1" to the module at load time : 800 801 # modprobe bonding miimon=100 mode=active-backup 802 803 or: 804 805 # modprobe bonding miimon=100 mode=1 806 807Or, put in your /etc/modules.conf : 808 809 alias bond0 bonding 810 options bond0 miimon=100 mode=active-backup 811 812Example 1: Using multiple host and multiple switches to build a "no single 813point of failure" solution. 814 815 816 | | 817 |port3 port3| 818 +-----+----+ +-----+----+ 819 | |port7 ISL port7| | 820 | switch A +--------------------------+ switch B | 821 | +--------------------------+ | 822 | |port8 port8| | 823 +----++----+ +-----++---+ 824 port2||port1 port1||port2 825 || +-------+ || 826 |+-------------+ host1 +---------------+| 827 | eth0 +-------+ eth1 | 828 | | 829 | +-------+ | 830 +--------------+ host2 +----------------+ 831 eth0 +-------+ eth1 832 833In this configuration, there is an ISL - Inter Switch Link (could be a trunk), 834several servers (host1, host2 ...) attached to both switches each, and one or 835more ports to the outside world (port3...). One and only one slave on each host 836is active at a time, while all links are still monitored (the system can 837detect a failure of active and backup links). 838 839Each time a host changes its active interface, it sticks to the new one until 840it goes down. In this example, the hosts are negligibly affected by the 841expiration time of the switches' forwarding tables. 842 843If host1 and host2 have the same functionality and are used in load balancing 844by another external mechanism, it is good to have host1's active interface 845connected to one switch and host2's to the other. Such system will survive 846a failure of a single host, cable, or switch. The worst thing that may happen 847in the case of a switch failure is that half of the hosts will be temporarily 848unreachable until the other switch expires its tables. 849 850Example 2: Using multiple ethernet cards connected to a switch to configure 851 NIC failover (switch is not required to support trunking). 852 853 854 +----------+ +----------+ 855 | |eth0 port1| | 856 | Host A +--------------------------+ switch | 857 | +--------------------------+ | 858 | |eth1 port2| | 859 +----------+ +----------+ 860 861 On host A : On the switch : 862 # modprobe bonding miimon=100 mode=1 # (optional) minimize the time 863 # ifconfig bond0 addr # for table expiration 864 # ifenslave bond0 eth0 eth1 865 866Each time the host changes its active interface, it sticks to the new one until 867it goes down. In this example, the host is strongly affected by the expiration 868time of the switch forwarding table. 869 870 8713) Adapting to your switches' timing 872------------------------------------ 873If your switches take a long time to go into backup mode, it may be 874desirable not to activate a backup interface immediately after a link goes 875down. It is possible to delay the moment at which a link will be 876completely disabled by passing the module parameter "downdelay" (in 877milliseconds, must be a multiple of miimon). 878 879When a switch reboots, it is possible that its ports report "link up" status 880before they become usable. This could fool a bond device by causing it to 881use some ports that are not ready yet. It is possible to delay the moment at 882which an active link will be reused by passing the module parameter "updelay" 883(in milliseconds, must be a multiple of miimon). 884 885A similar situation can occur when a host re-negotiates a lost link with the 886switch (a case of cable replacement). 887 888A special case is when a bonding interface has lost all slave links. Then the 889driver will immediately reuse the first link that goes up, even if updelay 890parameter was specified. (If there are slave interfaces in the "updelay" state, 891the interface that first went into that state will be immediately reused.) This 892allows to reduce down-time if the value of updelay has been overestimated. 893 894Examples : 895 896 # modprobe bonding miimon=100 mode=1 downdelay=2000 updelay=5000 897 # modprobe bonding miimon=100 mode=balance-rr downdelay=0 updelay=5000 898 899 900Promiscuous Sniffing notes 901========================== 902 903If you wish to bond channels together for a network sniffing 904application --- you wish to run tcpdump, or ethereal, or an IDS like 905snort, with its input aggregated from multiple interfaces using the 906bonding driver --- then you need to handle the Promiscuous interface 907setting by hand. Specifically, when you "ifconfing bond0 up" you 908must add the promisc flag there; it will be propagated down to the 909slave interfaces at ifenslave time; a full example might look like: 910 911 grep bond0 /etc/modules.conf || echo alias bond0 bonding >/etc/modules.conf 912 ifconfig bond0 promisc up 913 for if in eth1 eth2 ...;do 914 ifconfig $if up 915 ifenslave bond0 $if 916 done 917 snort ... -i bond0 ... 918 919Ifenslave also wants to propagate addresses from interface to 920interface, appropriately for its design functions in HA and channel 921capacity aggregating; but it works fine for unnumbered interfaces; 922just ignore all the warnings it emits. 923 924 9258021q VLAN support 926================== 927 928It is possible to configure VLAN devices over a bond interface using the 8021q 929driver. However, only packets coming from the 8021q driver and passing through 930bonding will be tagged by default. Self generated packets, like bonding's 931learning packets or ARP packets generated by either ALB mode or the ARP 932monitor mechanism, are tagged internally by bonding itself. As a result, 933bonding has to "learn" what VLAN IDs are configured on top of it, and it uses 934those IDs to tag self generated packets. 935 936For simplicity reasons, and to support the use of adapters that can do VLAN 937hardware acceleration offloding, the bonding interface declares itself as 938fully hardware offloaing capable, it gets the add_vid/kill_vid notifications 939to gather the necessary information, and it propagates those actions to the 940slaves. 941In case of mixed adapter types, hardware accelerated tagged packets that should 942go through an adapter that is not offloading capable are "un-accelerated" by the 943bonding driver so the VLAN tag sits in the regular location. 944 945VLAN interfaces *must* be added on top of a bonding interface only after 946enslaving at least one slave. This is because until the first slave is added the 947bonding interface has a HW address of 00:00:00:00:00:00, which will be copied by 948the VLAN interface when it is created. 949 950Notice that a problem would occur if all slaves are released from a bond that 951still has VLAN interfaces on top of it. When later coming to add new slaves, the 952bonding interface would get a HW address from the first slave, which might not 953match that of the VLAN interfaces. It is recommended that either all VLANs are 954removed and then re-added, or to manually set the bonding interface's HW 955address so it matches the VLAN's. (Note: changing a VLAN interface's HW address 956would set the underlying device -- i.e. the bonding interface -- to promiscouos 957mode, which might not be what you want). 958 959 960Limitations 961=========== 962The main limitations are : 963 - only the link status is monitored. If the switch on the other side is 964 partially down (e.g. doesn't forward anymore, but the link is OK), the link 965 won't be disabled. Another way to check for a dead link could be to count 966 incoming frames on a heavily loaded host. This is not applicable to small 967 servers, but may be useful when the front switches send multicast 968 information on their links (e.g. VRRP), or even health-check the servers. 969 Use the arp_interval/arp_ip_target parameters to count incoming/outgoing 970 frames. 971 972 973 974Resources and Links 975=================== 976 977Current development on this driver is posted to: 978 - http://www.sourceforge.net/projects/bonding/ 979 980Donald Becker's Ethernet Drivers and diag programs may be found at : 981 - http://www.scyld.com/network/ 982 983You will also find a lot of information regarding Ethernet, NWay, MII, etc. at 984www.scyld.com. 985 986Patches for 2.2 kernels are at Willy Tarreau's site : 987 - http://wtarreau.free.fr/pub/bonding/ 988 - http://www-miaif.lip6.fr/~tarreau/pub/bonding/ 989 990To get latest informations about Linux Kernel development, please consult 991the Linux Kernel Mailing List Archives at : 992 http://www.ussg.iu.edu/hypermail/linux/kernel/ 993 994-- END -- 995