Debian Bug report logs - #609242
ifupdown don't wait for bonding goes up before exiting, causing other services depending on $network to fail starting

version graph

Package: ifenslave-2.6; Maintainer for ifenslave-2.6 is Guus Sliepen <guus@debian.org>; Source for ifenslave-2.6 is src:ifenslave.

Reported by: Eric Belhomme <rico-debian-bts@ricozome.net>

Date: Fri, 7 Jan 2011 17:51:01 UTC

Severity: normal

Tags: squeeze-ignore

Found in version ifenslave-2.6/1.1.0-17

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, rico@eve-team.com, Guus Sliepen <guus@debian.org>:
Bug#609242; Package ifenslave-2.6. (Fri, 07 Jan 2011 17:51:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Belhomme <rico-debian-bts@ricozome.net>:
New Bug report received and forwarded. Copy sent to rico@eve-team.com, Guus Sliepen <guus@debian.org>. (Fri, 07 Jan 2011 17:51:04 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Eric Belhomme <rico-debian-bts@ricozome.net>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: ifupdown don't wait for bonding goes up before exiting, causing other services depending on $network to fail starting
Date: Fri, 07 Jan 2011 18:39:53 +0100
Package: ifenslave-2.6
Version: 1.1.0-17
Severity: grave
Tags: squeeze

Hi,

I encounter a serious issue on a Dell R310 server with its both NICs bonded the Debian way :

* Hardware details for NICs :
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20)

* Running Kernel :
ii  linux-headers-2.6.32-5-amd64        2.6.32-29                   Header files for Linux 2.6.32-5-amd64
ii  linux-headers-2.6.32-5-common       2.6.32-29                   Common header files for Linux 2.6.32-5
ii  firmware-bnx2                       0.27                        Binary firmware for Broadcom NetXtremeII

auto bond0
iface bond0 inet static
	slaves eth0 eth1
	bond_mode 802.3ad
	bond_xmit_hash_policy layer2+3
	bond_miimon 100
	bond_downdelay 5000
	bond_updelay 10000
	address 192.168.1.10
	netmask 255.255.255.0
	gateway 192.168.1.254

As said in the subject, ifupdown exits before the bond interface is active. I tried to raise the updelay to get the slaves active but it has no effect :

[   11.426497] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex
[   11.478596] bonding: bond0: link status up for interface eth0, enabling it in 0 ms.
[   11.486325] bonding: bond0: link status definitely up for interface eth0.
[   11.493738] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[   11.515333] bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
[   11.590403] bonding: bond0: link status up for interface eth1, enabling it in 10000 ms.
Starting LDAP connection daemon: nslcd[   21.581245] bonding: bond0: link status definitely up for interface eth1.
nslcd: failed to bind to LDAP server ldaps://ldap.eve/: Can't contact LDAP server: Connection timed out
nslcd: no available LDAP server found
nslcd: no base defined in config and couldn't get one from server
 failed!

You can see on this log that sysv-rc tries to start nslcd daemon *before* bonding module reports bond0 to be effectively up, causing nslcd to fail to start... As everything on my setup relies on LDAP for auth, nothing is working until I locally log as root to manually restart failed services...

I'm not sure id I should assign this bug to ifenslave or to ifupdown package, so sorry for the noise if I'm wrong !

Regards,

-- 
Eric

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages ifenslave-2.6 depends on:
ii  iproute                       20100519-3 networking and traffic control too
ii  libc6                         2.11.2-7   Embedded GNU C Library: Shared lib

Versions of packages ifenslave-2.6 recommends:
ii  net-tools                     1.60-23    The NET-3 networking toolkit

ifenslave-2.6 suggests no packages.

-- no debconf information




Information forwarded to debian-bugs-dist@lists.debian.org, Guus Sliepen <guus@debian.org>:
Bug#609242; Package ifenslave-2.6. (Fri, 07 Jan 2011 18:09:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Belhomme <rico-debian-bts@ricozome.net>:
Extra info received and forwarded to list. Copy sent to Guus Sliepen <guus@debian.org>. (Fri, 07 Jan 2011 18:09:03 GMT) Full text and rfc822 format available.

Message #10 received at 609242@bugs.debian.org (full text, mbox):

From: Eric Belhomme <rico-debian-bts@ricozome.net>
To: 609242@bugs.debian.org
Subject: Re: Bug#609242: Acknowledgement (ifupdown don't wait for bonding goes up before exiting, causing other services depending on $network to fail starting)
Date: Fri, 07 Jan 2011 19:00:28 +0100
Le 07/01/2011 18:51, Debian Bug Tracking System a écrit :

I forgot to precise, a workaround I found consist of adding this in 
interfaces config file :

post-up /bin/ping -c 8 -i 5 ldap.localnet > /tmp/ping

This forces ifupdown to wait until the ping command exit, but you'll 
agree it's not very clean...

Regards,

-- 
Eric Belhomme




Removed tag(s) squeeze. Request was from Mehdi Dogguy <mehdi@debian.org> to control@bugs.debian.org. (Fri, 07 Jan 2011 18:33:06 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Guus Sliepen <guus@debian.org>:
Bug#609242; Package ifenslave-2.6. (Tue, 18 Jan 2011 11:39:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Julien Cristau <jcristau@debian.org>:
Extra info received and forwarded to list. Copy sent to Guus Sliepen <guus@debian.org>. (Tue, 18 Jan 2011 11:39:05 GMT) Full text and rfc822 format available.

Message #17 received at 609242@bugs.debian.org (full text, mbox):

From: Julien Cristau <jcristau@debian.org>
To: Eric Belhomme <rico-debian-bts@ricozome.net>, 609242@bugs.debian.org
Subject: Re: Bug#609242: ifupdown don't wait for bonding goes up before exiting, causing other services depending on $network to fail starting
Date: Tue, 18 Jan 2011 12:36:41 +0100
[Message part 1 (text/plain, inline)]
user release.debian.org@packages.debian.org
usertag 609242 squeeze-can-defer
tag 609242 squeeze-ignore
kthxbye

On Fri, Jan  7, 2011 at 18:39:53 +0100, Eric Belhomme wrote:

> As said in the subject, ifupdown exits before the bond interface is active. I tried to raise the updelay to get the slaves active but it has no effect :
> 
> [   11.426497] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex
> [   11.478596] bonding: bond0: link status up for interface eth0, enabling it in 0 ms.
> [   11.486325] bonding: bond0: link status definitely up for interface eth0.
> [   11.493738] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> [   11.515333] bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
> [   11.590403] bonding: bond0: link status up for interface eth1, enabling it in 10000 ms.
> Starting LDAP connection daemon: nslcd[   21.581245] bonding: bond0: link status definitely up for interface eth1.
> nslcd: failed to bind to LDAP server ldaps://ldap.eve/: Can't contact LDAP server: Connection timed out
> nslcd: no available LDAP server found
> nslcd: no base defined in config and couldn't get one from server
>  failed!
> 
> You can see on this log that sysv-rc tries to start nslcd daemon *before* bonding module reports bond0 to be effectively up, causing nslcd to fail to start... As everything on my setup relies on LDAP for auth, nothing is working until I locally log as root to manually restart failed services...
> 
> I'm not sure id I should assign this bug to ifenslave or to ifupdown package, so sorry for the noise if I'm wrong !
> 
Considering that this doesn't sound like a new bug, you have a
workaround (if hacky) and this hopefully won't affect too many people,
I'm tagging this as not a blocker for the squeeze release.  If a fix is
available later it can be applied in a point release.

Cheers,
Julien
[signature.asc (application/pgp-signature, inline)]

Added tag(s) squeeze-ignore. Request was from Julien Cristau <jcristau@debian.org> to control@bugs.debian.org. (Tue, 18 Jan 2011 11:39:06 GMT) Full text and rfc822 format available.

Severity set to 'normal' from 'grave' Request was from Guus Sliepen <guus@debian.org> to control@bugs.debian.org. (Mon, 14 Mar 2011 21:15:02 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#609242; Package ifenslave-2.6. (Mon, 14 Mar 2011 21:24:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Guus Sliepen <guus@debian.org>:
Extra info received and forwarded to list. (Mon, 14 Mar 2011 21:24:06 GMT) Full text and rfc822 format available.

Message #26 received at 609242@bugs.debian.org (full text, mbox):

From: Guus Sliepen <guus@debian.org>
To: Eric Belhomme <rico-debian-bts@ricozome.net>, 609242@bugs.debian.org
Subject: Re: Bug#609242: ifupdown don't wait for bonding goes up before exiting, causing other services depending on $network to fail starting
Date: Mon, 14 Mar 2011 22:12:45 +0100
[Message part 1 (text/plain, inline)]
severity 609242 normal
thanks

My apologies for the long time it took me to answer this email!

> I encounter a serious issue on a Dell R310 server with its both NICs bonded the Debian way :
[...]
> auto bond0
> iface bond0 inet static
> 	slaves eth0 eth1
> 	bond_mode 802.3ad
> 	bond_xmit_hash_policy layer2+3
> 	bond_miimon 100
> 	bond_downdelay 5000
> 	bond_updelay 10000
> 	address 192.168.1.10
> 	netmask 255.255.255.0
> 	gateway 192.168.1.254
> 
> As said in the subject, ifupdown exits before the bond interface is active. I tried to raise the updelay to get the slaves active but it has no effect :
> 
> [   11.426497] bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex
> [   11.478596] bonding: bond0: link status up for interface eth0, enabling it in 0 ms.
> [   11.486325] bonding: bond0: link status definitely up for interface eth0.
> [   11.493738] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready

According to those log entries, the bond0 interface has become active, because
the first Ethernet interface has been detected and is enabled immediately.

> [   11.515333] bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
> [   11.590403] bonding: bond0: link status up for interface eth1, enabling it in 10000 ms.

The second Ethernet interface is also detected, but since the bond0 interface
is already active, the kernel waits bond_updelay milliseconds before enabling
it. This is correct behaviour.

> Starting LDAP connection daemon: nslcd[   21.581245] bonding: bond0: link status definitely up for interface eth1.
> nslcd: failed to bind to LDAP server ldaps://ldap.eve/: Can't contact LDAP server: Connection timed out
> nslcd: no available LDAP server found
> nslcd: no base defined in config and couldn't get one from server
>  failed!
> 
> You can see on this log that sysv-rc tries to start nslcd daemon *before* bonding module reports bond0 to be effectively up, causing nslcd to fail to start... As everything on my setup relies on LDAP for auth, nothing is working until I locally log as root to manually restart failed services...
> 
> I'm not sure id I should assign this bug to ifenslave or to ifupdown package, so sorry for the noise if I'm wrong !

This does not seem like a bug in either ifenslave or ifupdown to me. I have
reproduced your setup, and I see the same things in the kernel logs. Also,
running "ifconfig bond0" and "ethtool bond0" immediately after "ifup bond0"
shows that the bond0 interface is properly configured and up. I can send
packets immediately to the bond0 device, and they come out of the first slave
as expected.

Perhaps there is another reason why the LDAP connection right after the bond0
device is up does not work?  If you start pinging immediately after ifup bond0,
do you see responses immediately as well, or if not, how long does it take for
them to arrive? Perhaps you can run tcpdump on both sides to see when packets
start to flow?

Also, if there is a misconfiguration and your bonding setup somehow needs eth1
to be active as well, then it would actually take bond_updelay milliseconds
before you have a working connection. You could try to set bond_updelay to 0 to
rule this out.

If your machine needs LDAP to work, I would check whether the LDAP daemon can
be configured to try to connect to the server indefinitely, as opposed to
quiting when the first connection doesn't work.

-- 
Met vriendelijke groet / with kind regards,
      Guus Sliepen <guus@debian.org>
[signature.asc (application/pgp-signature, inline)]

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Apr 17 04:40:08 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.