Debian Bug report logs - #596983
libnss-ldapd: Fallback to secondary ldap server does not work as expected

version graph

Package: libnss-ldapd; Maintainer for libnss-ldapd is Arthur de Jong <adejong@debian.org>; Source for libnss-ldapd is src:nss-pam-ldapd.

Reported by: Matthias Wamser <mw+debian@ilk.net>

Date: Wed, 15 Sep 2010 16:51:01 UTC

Severity: important

Found in version nss-ldapd/0.6.7.2

Fixed in version nss-pam-ldapd/0.7.10

Done: Arthur de Jong <adejong@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Arthur de Jong <adejong@debian.org>:
Bug#596983; Package libnss-ldapd. (Wed, 15 Sep 2010 16:51:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Matthias Wamser <mw+debian@ilk.net>:
New Bug report received and forwarded. Copy sent to Arthur de Jong <adejong@debian.org>. (Wed, 15 Sep 2010 16:51:04 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Matthias Wamser <mw+debian@ilk.net>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: libnss-ldapd: Fallback to secondary ldap server does not work as expected
Date: Wed, 15 Sep 2010 18:36:27 +0200
Package: libnss-ldapd
Version: 0.6.7.2
Severity: important


Hi,

i wanted to replace my libnss-ldap setup by libnss-ldapd. At first sight it seems to work like a charm.

Response times were even without nscd much better than before, but fallback to secondary ldap server does not work as expected.

If i block all request on the first ldap server by iptables i always get a timeout from nscld:
nslcd: [b127f8] ldap_result() timed out

It never reconnects to the other server. I tried all posibities of changing timeout values in /etc/nss-ldapd.conf. I changed ssl on and off ...

My last (and stupid) try was up to:
threads 1
bind_timelimit 1
timelimit 1
idle_timelimit 10

In netstat output i see many ESTABLISHED an CLOSE_WAIT connections to the (not reachable) ldap server.

The only way to connect to the second ldap server is killing and restarting nslcd (in my test scenario ldap2 is indeed the first server to ask):

pkill -9 nslcd
nslcd -d
nslcd: DEBUG: add_uri(ldap://ldap2.xxxxxxxxxx/)
nslcd: DEBUG: add_uri(ldap://ldap1.xxxxxxxxxx/)
nslcd: version 0.6.7 starting
nslcd: DEBUG: setgroups(0,NULL) done
nslcd: DEBUG: setgid(110) done
nslcd: DEBUG: setuid(106) done
nslcd: accepting connections

nslcd: [b0dc51] DEBUG: connection from pid=16207 uid=0 gid=0
nslcd: [b0dc51] DEBUG: nslcd_group_bygid(1111)
nslcd: [b0dc51] DEBUG: myldap_search(base="xxxxxx", filter="(&(objectClass=posixGroup)(gidNumber=1111))")
nslcd: [b0dc51] DEBUG: ldap_result(): end of results

In this case, the first lookup takes bind_timelimit to succeed and susequent queries go automatically to the fallback server.

But this is definitly not satisfying. If i use libnss-ldap like before on the same machine everything works as expected.

So my conclusion is, that nslcd seems to connect to the first ldap server and tries to keep this connection forever. I also waited some minutes and nothing changed. Even if i restart the network interface locally it does not try to connect to the second server.

The only way to use the fallback server is to restart nslcd.

If you need more information or if i could do some more testing let me know.

Regards,
matthias

-- System Information:
Debian Release: 5.0.6
  APT prefers stable
  APT policy: (990, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-2-686 (SMP w/1 CPU core)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) (ignored: LC_ALL set to C)
Shell: /bin/sh linked to /bin/bash

Versions of packages libnss-ldapd depends on:
ii  adduser         3.110                    add and remove users and groups
ii  debconf [debcon 1.5.24                   Debian configuration management sy
ii  libc6           2.7-18lenny4             GNU C Library: Shared libraries
ii  libkrb53        1.6.dfsg.4~beta1-5lenny4 MIT Kerberos runtime libraries
ii  libldap-2.4-2   2.4.11-1+lenny2          OpenLDAP libraries
ii  libsasl2-2      2.1.22.dfsg1-23+lenny1   Cyrus SASL - authentication abstra

Versions of packages libnss-ldapd recommends:
ii  libpam-ldap                 184-4.2      Pluggable Authentication Module fo
ii  nscd                        2.7-18lenny4 GNU C Library: Name Service Cache 

libnss-ldapd suggests no packages.

-- debconf information excluded




Information forwarded to debian-bugs-dist@lists.debian.org, Arthur de Jong <adejong@debian.org>:
Bug#596983; Package libnss-ldapd. (Mon, 20 Sep 2010 21:15:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to 596983@bugs.debian.org:
Extra info received and forwarded to list. Copy sent to Arthur de Jong <adejong@debian.org>. (Mon, 20 Sep 2010 21:15:03 GMT) Full text and rfc822 format available.

Message #10 received at 596983@bugs.debian.org (full text, mbox):

From: Arthur de Jong <adejong@debian.org>
To: Matthias Wamser <mw+debian@ilk.net>, 596983@bugs.debian.org
Subject: Re: Bug#596983: libnss-ldapd: Fallback to secondary ldap server does not work as expected
Date: Mon, 20 Sep 2010 23:12:33 +0200
[Message part 1 (text/plain, inline)]
tags 596983 + pending
thanks

On Wed, 2010-09-15 at 18:36 +0200, Matthias Wamser wrote:
> i wanted to replace my libnss-ldap setup by libnss-ldapd. At first
> sight it seems to work like a charm.
> 
> Response times were even without nscd much better than before, but
> fallback to secondary ldap server does not work as expected.
> 
> If i block all request on the first ldap server by iptables i always
> get a timeout from nscld:
> nslcd: [b127f8] ldap_result() timed out

This is indeed a bug in nslcd, thanks for pointing it out. Timeouts from
ldap_result() did not result in a disconnect from the LDAP server. This
has been fixed in the development version.

I don't think a fix for lenny is feasible, I will try to get this fix in
squeeze though.

Btw, I'm not 100% sure if a severing network connection with iptables
simulates a typical network failure. I haven't run into this issue
before in environments I manage.

Anyway, thanks again for pointing this out.

-- 
-- arthur - adejong@debian.org - http://people.debian.org/~adejong --
[signature.asc (application/pgp-signature, inline)]

Added tag(s) pending. Request was from Arthur de Jong <adejong@debian.org> to control@bugs.debian.org. (Mon, 20 Sep 2010 21:15:10 GMT) Full text and rfc822 format available.

Reply sent to Arthur de Jong <adejong@debian.org>:
You have taken responsibility. (Fri, 24 Sep 2010 07:51:08 GMT) Full text and rfc822 format available.

Notification sent to Matthias Wamser <mw+debian@ilk.net>:
Bug acknowledged by developer. (Fri, 24 Sep 2010 07:51:08 GMT) Full text and rfc822 format available.

Message #17 received at 596983-close@bugs.debian.org (full text, mbox):

From: Arthur de Jong <adejong@debian.org>
To: 596983-close@bugs.debian.org
Subject: Bug#596983: fixed in nss-pam-ldapd 0.7.10
Date: Fri, 24 Sep 2010 07:47:08 +0000
Source: nss-pam-ldapd
Source-Version: 0.7.10

We believe that the bug you reported is fixed in the latest version of
nss-pam-ldapd, which is due to be installed in the Debian FTP archive:

libnss-ldapd_0.7.10_i386.deb
  to main/n/nss-pam-ldapd/libnss-ldapd_0.7.10_i386.deb
libpam-ldapd_0.7.10_i386.deb
  to main/n/nss-pam-ldapd/libpam-ldapd_0.7.10_i386.deb
nslcd_0.7.10_i386.deb
  to main/n/nss-pam-ldapd/nslcd_0.7.10_i386.deb
nss-pam-ldapd_0.7.10.dsc
  to main/n/nss-pam-ldapd/nss-pam-ldapd_0.7.10.dsc
nss-pam-ldapd_0.7.10.tar.gz
  to main/n/nss-pam-ldapd/nss-pam-ldapd_0.7.10.tar.gz



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 596983@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Arthur de Jong <adejong@debian.org> (supplier of updated nss-pam-ldapd package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.8
Date: Fri, 24 Sep 2010 09:00:00 +0200
Source: nss-pam-ldapd
Binary: nslcd libnss-ldapd libpam-ldapd
Architecture: source i386
Version: 0.7.10
Distribution: unstable
Urgency: low
Maintainer: Arthur de Jong <adejong@debian.org>
Changed-By: Arthur de Jong <adejong@debian.org>
Description: 
 libnss-ldapd - NSS module for using LDAP as a naming service
 libpam-ldapd - PAM module for using LDAP as an authentication service
 nslcd      - Daemon for NSS and PAM lookups using LDAP
Closes: 596983
Changes: 
 nss-pam-ldapd (0.7.10) unstable; urgency=low
 .
   * handle errors from ldap_result() better and disconnect (and reconnect)
     in more cases (closes: #596983)
Checksums-Sha1: 
 86abbd049496f5af0c93b0b8a05937aede895ce7 1106 nss-pam-ldapd_0.7.10.dsc
 ab0bd5315b516ec3579f2c086606cf74df6e3c54 478118 nss-pam-ldapd_0.7.10.tar.gz
 2965544f1308988e557525123be0cd2c26c4f42e 123168 nslcd_0.7.10_i386.deb
 7e4769029219902fbbfda0b06aa5f5dd78b7bd98 43492 libnss-ldapd_0.7.10_i386.deb
 23f60d2a32b5b1d4aaa6c0e74293b3b0280fb292 36176 libpam-ldapd_0.7.10_i386.deb
Checksums-Sha256: 
 a982254a1a0d876a516f5df956d0d36d4bdc6f56e59d818223d2f2a085b67cd1 1106 nss-pam-ldapd_0.7.10.dsc
 63cb988196cedee7be30aa01034fcbdea17604a03184597a634eb9387622a486 478118 nss-pam-ldapd_0.7.10.tar.gz
 b0f482633b29414b1e334e2bdb0bf962ffe4e11332bf21579e5bedd975d9060d 123168 nslcd_0.7.10_i386.deb
 b0e24a6935eb648671f9270158ee17a4b66663b22fef50c85b92befd53d4bd7c 43492 libnss-ldapd_0.7.10_i386.deb
 20d5c8c7088d1c86425b63096a7de56acc64495194cc7f1f7edf1365389e8653 36176 libpam-ldapd_0.7.10_i386.deb
Files: 
 05184f3049dd6bad5cec28397a88f34d 1106 admin extra nss-pam-ldapd_0.7.10.dsc
 d01c3313712aae7471f22bc7922d892a 478118 admin extra nss-pam-ldapd_0.7.10.tar.gz
 480edd750d43dc02dcf8050fc9879fd3 123168 admin extra nslcd_0.7.10_i386.deb
 f1096ff6aadd35cfa8c8258d957f1716 43492 admin extra libnss-ldapd_0.7.10_i386.deb
 9697b49a775111865a6d2d586479dbe2 36176 admin extra libpam-ldapd_0.7.10_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAkycUsYACgkQVYan35+NCKfFsQCguI3YZdOUDuEe3IPk+BIegNmB
Vq4AoKUyRxTd4uNBL5v8OnXzPAbna6L0
=2SyO
-----END PGP SIGNATURE-----





Information forwarded to debian-bugs-dist@lists.debian.org, Arthur de Jong <adejong@debian.org>:
Bug#596983; Package libnss-ldapd. (Mon, 27 Sep 2010 15:39:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Matthias Wamser <mw+debian@ilk.net>:
Extra info received and forwarded to list. Copy sent to Arthur de Jong <adejong@debian.org>. (Mon, 27 Sep 2010 15:39:06 GMT) Full text and rfc822 format available.

Message #22 received at 596983@bugs.debian.org (full text, mbox):

From: Matthias Wamser <mw+debian@ilk.net>
To: 596983@bugs.debian.org
Subject: Re: Bug#596983 closed by Arthur de Jong <adejong@debian.org> (Bug#596983: fixed in nss-pam-ldapd 0.7.10)
Date: Mon, 27 Sep 2010 17:24:35 +0200
Am 24.09.10 09:51, schrieb Debian Bug Tracking System:
> #596983: libnss-ldapd: Fallback to secondary ldap server does not work as expected
> 
> It has been closed by Arthur de Jong <adejong@debian.org>.

Hi,

i think the problem is only partially solved.

Without ssl i have the following behaviour:

If the first LDAP Server fails, nslcd reconnects to the second server
after a minimum time of 10 seconds. Even if all possible timelimits in
nslcd.conf are set to 1 second.

If the first ldap server is available again, nslcd still uses the second
server and doest not seem to reconnect to the first server, even if
waiting some minutes (what does the idle_timelimt, nothing?).

If i enable ssl the reconnect mechanism does not work at all.
nslcd never tries to connect to the second ldap server.

I tested with version 0.7.10 from unstable.

regards,
Matthias Wamser
-- 
   Matthias Wamser, Senior Systems Engineer, mailto: mw@ilk.net
   ILK Internet GmbH, Am Sandfeld 15 a, D-76149 Karlsruhe
   Tel: +49 (0) 721 9100 0, http://www.ilk.net
   Geschaeftsfuehrer Matthias Felger, AG Mannheim, HRB 107037




Information forwarded to debian-bugs-dist@lists.debian.org, Arthur de Jong <adejong@debian.org>:
Bug#596983; Package libnss-ldapd. (Mon, 27 Sep 2010 21:39:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to 596983@bugs.debian.org:
Extra info received and forwarded to list. Copy sent to Arthur de Jong <adejong@debian.org>. (Mon, 27 Sep 2010 21:39:06 GMT) Full text and rfc822 format available.

Message #27 received at 596983@bugs.debian.org (full text, mbox):

From: Arthur de Jong <adejong@debian.org>
To: Matthias Wamser <mw+debian@ilk.net>, 596983@bugs.debian.org
Subject: Re: Bug#596983: closed by Arthur de Jong <adejong@debian.org> (Bug#596983: fixed in nss-pam-ldapd 0.7.10)
Date: Mon, 27 Sep 2010 23:36:02 +0200
[Message part 1 (text/plain, inline)]
On Mon, 2010-09-27 at 17:24 +0200, Matthias Wamser wrote:
> Without ssl i have the following behaviour:
> 
> If the first LDAP Server fails, nslcd reconnects to the second server
> after a minimum time of 10 seconds. Even if all possible timelimits in
> nslcd.conf are set to 1 second.

I cannot fully reproduce this in my test environment. In my setup I have
slapd listening on both ports 389 and 390. Both URIs are
in /etc/nslcd.conf. Also, to simplify testing, I have threads 1
configured.

I start slapd and then nslcd. I do getent passwd someuser and everything
works (now nslcd has a connection option to slapd). I kill the
connection with:
  iptables -A OUTPUT -d 127.0.0.1 -p tcp --dport 389 -j DROP

If I now retry the getent it takes exactly timelimit + bind_timelimit
seconds. This is because first the connection takes timelimit seconds to
time out at which point it is disconnected. If the failure occurs on an
existing connection no fail-over is done and first a reconnect to the
current server is done. This takes another bind_timelimit seconds.

Arguably, nslcd could fail over to the second LDAP server on first error
but this is a bit more tricky to get right (so may not be fixable for
squeeze). Currently, the fail-over is only implemented when the initial
connection fails.

> If the first ldap server is available again, nslcd still uses the second
> server and doest not seem to reconnect to the first server, even if
> waiting some minutes (what does the idle_timelimt, nothing?).

Again, nslcd only switches LDAP server on bind error so it will stay
with the second server until either that one fails or nslcd is
restarted. There is probably room for improvement here though.

> If i enable ssl the reconnect mechanism does not work at all.
> nslcd never tries to connect to the second ldap server.

nslcd hangs while disconnecting from the first server. It hangs inside
OpenLDAP code. For non-TLS bind the unbind() only writes some data and
doesn't care about any answers, for an unbind() of a TLS connection some
response is expected from the server.

I'll see of something can be done about this but I suspect it may be a
problem within OpenLDAP.

> I tested with version 0.7.10 from unstable.

Anyway, thanks for the thorough testing.

-- 
-- arthur - adejong@debian.org - http://people.debian.org/~adejong --
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Arthur de Jong <adejong@debian.org>:
Bug#596983; Package libnss-ldapd. (Tue, 12 Oct 2010 21:31:36 GMT) Full text and rfc822 format available.

Acknowledgement sent to 596983@bugs.debian.org:
Extra info received and forwarded to list. Copy sent to Arthur de Jong <adejong@debian.org>. (Tue, 12 Oct 2010 21:31:36 GMT) Full text and rfc822 format available.

Message #32 received at 596983@bugs.debian.org (full text, mbox):

From: Arthur de Jong <adejong@debian.org>
To: Matthias Wamser <mw+debian@ilk.net>, 596983 <596983@bugs.debian.org>
Subject: Re: Bug#596983: closed by Arthur de Jong <adejong@debian.org> (Bug#596983: fixed in nss-pam-ldapd 0.7.10)
Date: Tue, 12 Oct 2010 23:23:47 +0200
[Message part 1 (text/plain, inline)]
On Mon, 2010-09-27 at 23:36 +0200, Arthur de Jong wrote:
> nslcd hangs while disconnecting from the first server. It hangs inside
> OpenLDAP code. For non-TLS bind the unbind() only writes some data and
> doesn't care about any answers, for an unbind() of a TLS connection some
> response is expected from the server.
> 
> I'll see of something can be done about this but I suspect it may be a
> problem within OpenLDAP.

I've implemented a workaround in nslcd. It sets socket receiving and
sending timeouts to the timelimit option of nslcd.conf [0]. This fixes
the problem in my test set-up.

I've also reported a bug on the OpenLDAP bug tracker [1].

[0] http://arthurdejong.org/viewvc/nss-pam-ldapd?view=rev&revision=1264
[1] http://www.openldap.org/its/index.cgi/Incoming?id=6673

-- 
-- arthur - adejong@debian.org - http://people.debian.org/~adejong --
[signature.asc (application/pgp-signature, inline)]

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Wed, 10 Nov 2010 07:33:17 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Apr 16 17:17:30 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.