Debian Bug report logs - #303057
slapd goes into endless sched_yield() loop

version graph

Package: slapd; Maintainer for slapd is Debian OpenLDAP Maintainers <pkg-openldap-devel@lists.alioth.debian.org>; Source for slapd is src:openldap.

Reported by: Wolfgang Kohnen <wollie@tzi.de>

Date: Mon, 4 Apr 2005 15:33:05 UTC

Severity: serious

Tags: unreproducible

Merged with 255276, 302992

Found in versions 2.1.30-1, 2.1.30-3

Fixed in version openldap2.2/2.2.23-6

Done: Torsten Landschoff <torsten@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Wolfgang Kohnen <wollie@tzi.de>:
New Bug report received and forwarded. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Wolfgang Kohnen <wollie@tzi.de>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: slapd goes into endless sched_yield() loop
Date: Mon, 04 Apr 2005 17:25:35 +0200
Package: slapd
Version: 2.1.30-3
Severity: important

Sometimes all openldap programs (slapd, slapcat, slapindex) which 
want to access my bdb_backend the program eats up all cpu cycles 
and doesn't react anymore except of SIGNAL 2 and 4 (not 15; didn't 
check any other).

Increasing the loglevel didn't show anything interesting
(to me; I am no programmer).  I did a strace on slapindex and slapcat
which showed both times that there is an endless invocation of
sched_yield().  A strace of slapcat can be found here:

	http://duplo.lis.bremen.de/~wollie/slapcat.strace

Please drop me a line if you would like to have a look into my exact
configureation.  Maybe the index lines in slapd.conf are interesting (I
am using gosa):

index default sub
index uid,mail eq
index gosaMailAlternateAddress,gosaMailForwardingAddress eq
index cn,sn,givenName,ou pres,eq,sub
index objectClass pres,eq
index uidNumber,gidNumber,memberuid eq
index gosaSubtreeACL,gosaObject,gosaUser pres,eq

Greets,
Wollie

-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.8
Locale: LANG=de_DE@euro, LC_CTYPE=de_DE@euro (charmap=ISO-8859-15)

Versions of packages slapd depends on:
ii  coreutils [fileutils]       5.2.1-2      The GNU core utilities
ii  debconf                     1.4.30.11    Debian configuration management sy
ii  libc6                       2.3.2.ds1-20 GNU C Library: Shared libraries an
ii  libdb4.2                    4.2.52-18    Berkeley v4.2 Database Libraries [
ii  libgcrypt11                 1.2.0-4      LGPL Crypto library - runtime libr
ii  libgnutls11                 1.0.16-9     GNU TLS library - runtime library
ii  libgpg-error0               1.0-1        library for common error values an
ii  libiodbc2                   3.52.2-3     iODBC Driver Manager
ii  libldap2                    2.1.30-3     OpenLDAP libraries
ii  libltdl3                    1.5.6-4      A system independent dlopen wrappe
ii  libsasl2                    2.1.19-1.5   Authentication abstraction library
ii  libslp1                     1.0.11a-2    OpenSLP libraries
ii  libwrap0                    7.6.dbs-8    Wietse Venema's TCP wrappers libra
ii  perl [libmime-base64-perl]  5.8.4-8      Larry Wall's Practical Extraction 
ii  psmisc                      21.5-1       Utilities that use the proc filesy
ii  zlib1g                      1:1.2.2-3    compression library - runtime

-- debconf information:
  slapd/password_mismatch:
  slapd/fix_directory: true
  slapd/invalid_config: true
* shared/organization: sub.example.com
  slapd/upgrade_slapcat_failure:
  slapd/upgrade_slapadd_failure:
  slapd/backend: BDB
* slapd/allow_ldap_v2: false
  slapd/no_configuration: false
  slapd/move_old_database: true
  slapd/suffix_change: false
  slapd/slave_databases_require_updateref:
  slapd/autoconf_modules: true
  slapd/purge_database: false
  slapd/admin:
* slapd/domain: sub.example.com



Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Torsten Landschoff <torsten@debian.org>:
Extra info received and forwarded to list. Full text and rfc822 format available.

Message #10 received at 303057@bugs.debian.org (full text, mbox):

From: Torsten Landschoff <torsten@debian.org>
To: Wolfgang Kohnen <wollie@tzi.de>, 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Mon, 4 Apr 2005 18:31:48 +0200
[Message part 1 (text/plain, inline)]
Hi Wolfgang, 

On Mon, Apr 04, 2005 at 05:25:35PM +0200, Wolfgang Kohnen wrote:
> Sometimes all openldap programs (slapd, slapcat, slapindex) which 
> want to access my bdb_backend the program eats up all cpu cycles 
> and doesn't react anymore except of SIGNAL 2 and 4 (not 15; didn't 
> check any other).

Known problem. Please try upgrading to 2.2.23. 2.1.30 will not ship with
sarge because of these and other problems. Running db4.2_recover in the
database directory may temporarily fix those problems but they are going
to strike again.

2.2.23 is only in unstable for now - sorry. Not sure if the dependencies
are fulfilled in testing already.

Greetings

	Torsten
[signature.asc (application/pgp-signature, inline)]

Severity set to `normal'. Request was from t.landschoff@gmx.net (Torsten Landschoff) to control@bugs.debian.org. Full text and rfc822 format available.

Merged 255276 302992 303057. Request was from t.landschoff@gmx.net (Torsten Landschoff) to control@bugs.debian.org. Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Sven Hartge <sven.hartge@mni.fh-giessen.de>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #19 received at 303057@bugs.debian.org (full text, mbox):

From: Sven Hartge <sven.hartge@mni.fh-giessen.de>
To: 303057@bugs.debian.org
Subject: Re: slapd goes into endless sched_yield() loop
Date: Tue, 05 Apr 2005 13:53:26 +0200
> On Mon, Apr 04, 2005 at 05:25:35PM +0200, Wolfgang Kohnen wrote:
>> Sometimes all openldap programs (slapd, slapcat, slapindex) which 
>> want to access my bdb_backend the program eats up all cpu cycles 
>> and doesn't react anymore except of SIGNAL 2 and 4 (not 15; didn't
>>  check any other).

> Known problem. Please try upgrading to 2.2.23. 2.1.30 will not ship 
> with sarge because of these and other problems. Running db4.2_recover
> in the database directory may temporarily fix those problems but they
> are going to strike again.

> 2.2.23 is only in unstable for now - sorry. Not sure if the
> dependencies are fulfilled in testing already.

I am deeply sorry, but I have to report 2.2.23 hitting the same problem 
for me.

How can this be? I thought all those bugs were elimated in 2.2 using db4.2?

Grüße,
Sven.



Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Torsten Landschoff <torsten@debian.org>:
Extra info received and forwarded to list. Full text and rfc822 format available.

Message #24 received at 303057@bugs.debian.org (full text, mbox):

From: Torsten Landschoff <torsten@debian.org>
To: Sven Hartge <sven.hartge@mni.fh-giessen.de>, 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Wed, 6 Apr 2005 15:22:45 +0200
[Message part 1 (text/plain, inline)]
Hi Sven, 

On Tue, Apr 05, 2005 at 01:53:26PM +0200, Sven Hartge wrote:
> >Known problem. Please try upgrading to 2.2.23. 2.1.30 will not ship 
> >with sarge because of these and other problems. Running db4.2_recover
> >in the database directory may temporarily fix those problems but they
> >are going to strike again.
> 
> I am deeply sorry, but I have to report 2.2.23 hitting the same problem 
> for me.
> 
> How can this be? I thought all those bugs were elimated in 2.2 using db4.2?

I thought so as well. Did you use a DB_CONFIG file suited for your
setup? Upstream keeps talking about the problems not being caused by
corruption but by thrashing. Probably in that case the maintainer
scripts /MUST/ install a basic DB_CONFIG which will work for most cases.
I am thinking along 8MB of caches or something. 

I would be very grateful if you could try that on your setup. See
/usr/share/doc/slapd/examples/DB_CONFIG for a template.

Thanks

	Torsten
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Sven Hartge <hartge@ds9.argh.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #29 received at 303057@bugs.debian.org (full text, mbox):

From: Sven Hartge <hartge@ds9.argh.org>
To: Torsten Landschoff <torsten@debian.org>
Cc: 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Wed, 06 Apr 2005 15:32:11 +0200
Torsten Landschoff wrote:

>> I am deeply sorry, but I have to report 2.2.23 hitting the same problem 
>> for me.

>> How can this be? I thought all those bugs were elimated in 2.2 using db4.2?

> I thought so as well. Did you use a DB_CONFIG file suited for your
> setup?

Yes, of course.

> I am thinking along 8MB of caches or something. 

#txn_checkpoint         128     15      1
set_cachesize           0       252428800        0
set_lk_max_objects      100000
set_lk_max_locks        100000
set_lg_regionmax        1048576
set_lg_max              8388608
set_lg_bsize            2097152
set_lg_dir              /var/lib/ldap/logs/
#set_lk_detect DB_LOCK_DEFAULT
set_tmp_dir             /tmp/
#set_flags DB_TXN_NOSYNC
#set_flags DB_TXN_NOT_DURABLE

Those enourmous amounts of possible locks have been set, after one of my 
replicas complained about being out of locks and the openldap-ML 
suggested increasing this number. Since I really need those servers, I 
went a little over the top with this value.

(txn_checkpoint does not work, I don't know why. bdb4.2 does not 
recognize it.)

Grüße,
S°

-- 
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/



Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Torsten Landschoff <torsten@debian.org>:
Extra info received and forwarded to list. Full text and rfc822 format available.

Message #34 received at 303057@bugs.debian.org (full text, mbox):

From: Torsten Landschoff <torsten@debian.org>
To: Sven Hartge <hartge@ds9.argh.org>, 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Wed, 6 Apr 2005 21:01:31 +0200
[Message part 1 (text/plain, inline)]
Hi Sven, 

On Wed, Apr 06, 2005 at 03:32:11PM +0200, Sven Hartge wrote:
 
> >I thought so as well. Did you use a DB_CONFIG file suited for your
> >setup?
> 
> Yes, of course.
 
> #txn_checkpoint         128     15      1
> set_cachesize           0       252428800        0
> set_lk_max_objects      100000
> set_lk_max_locks        100000
> set_lg_regionmax        1048576
> set_lg_max              8388608
> set_lg_bsize            2097152
> set_lg_dir              /var/lib/ldap/logs/
> #set_lk_detect DB_LOCK_DEFAULT
> set_tmp_dir             /tmp/
> #set_flags DB_TXN_NOSYNC
> #set_flags DB_TXN_NOT_DURABLE

Hmm, very interesting. I wonder why it works at Stanford and apparently
nowhere else. I was notified today that putting a DB_CONFIG file into
the directory has no effect after the initial database was created so 
I'd like to ask if you did it "in time".

Sorry for the all problems, I'd really like to have these problems
fixed!

Thanks for the quick feedback in any case.

Greetings

	Torsten
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Sven Hartge <hartge@ds9.argh.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #39 received at 303057@bugs.debian.org (full text, mbox):

From: Sven Hartge <hartge@ds9.argh.org>
To: Torsten Landschoff <torsten@debian.org>
Cc: 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Wed, 6 Apr 2005 21:27:19 +0200 (CEST)
Um 21:01 Uhr am 06.04.05 schrieb Torsten Landschoff:
> On Wed, Apr 06, 2005 at 03:32:11PM +0200, Sven Hartge wrote:

>>> I thought so as well. Did you use a DB_CONFIG file suited for your
>>> setup?

>> Yes, of course.  

>> #txn_checkpoint         128     15      1
>> set_cachesize           0       252428800        0
>> set_lk_max_objects      100000
>> set_lk_max_locks        100000
>> set_lg_regionmax        1048576
>> set_lg_max              8388608
>> set_lg_bsize            2097152
>> set_lg_dir              /var/lib/ldap/logs/
>> #set_lk_detect DB_LOCK_DEFAULT
>> set_tmp_dir             /tmp/
>> #set_flags DB_TXN_NOSYNC
>> #set_flags DB_TXN_NOT_DURABLE

> Hmm, very interesting. I wonder why it works at Stanford and apparently
> nowhere else. I was notified today that putting a DB_CONFIG file into
> the directory has no effect after the initial database was created so 
> I'd like to ask if you did it "in time".

Of course. First I copied this DB_CONFIG into /var/lib/ldap, the I 
de-commented the last two lines, slapadd'ed my data, re-commented the last 
two lines and fired up slapd.

> Sorry for the all problems, I'd really like to have these problems 
> fixed!

Right now I am running with LD_ASSUME_KERNEL=2.4.1 as suggested in 
$the_other_bug, so far no problems, but as this sched_yield()-problem 
needs some time to show, I don't know, if this really is the solution or 
if I am just lucky right now.

Grüße,
Sven.

-- 
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/



Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Sven Hartge <hartge@ds9.argh.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #44 received at 303057@bugs.debian.org (full text, mbox):

From: Sven Hartge <hartge@ds9.argh.org>
To: Torsten Landschoff <torsten@debian.org>
Cc: 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Wed, 6 Apr 2005 22:08:31 +0200 (CEST)
Um 21:01 Uhr am 06.04.05 schrieb Torsten Landschoff:

> Hmm, very interesting. I wonder why it works at Stanford and apparently
> nowhere else. I was notified today that putting a DB_CONFIG file into
> the directory has no effect after the initial database was created so 
> I'd like to ask if you did it "in time".

BTW: I also have two NetBSD2.0 machines running a replica of the tree and 
those two never had any problems so far, neither with 2.2.19/db4.2 nor 
with 2.2.20/db4.3.

This is totally frustrating of being not able to timely reproduce this 
bug.

Grüße,
Sven.

-- 
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/



Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Torsten Landschoff <torsten@debian.org>:
Extra info received and forwarded to list. Full text and rfc822 format available.

Message #49 received at 303057@bugs.debian.org (full text, mbox):

From: Torsten Landschoff <torsten@debian.org>
To: Sven Hartge <hartge@ds9.argh.org>, 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Mon, 11 Apr 2005 09:24:27 +0200
[Message part 1 (text/plain, inline)]
Hi Sven, 

On Wed, Apr 06, 2005 at 09:27:19PM +0200, Sven Hartge wrote:
 
> Right now I am running with LD_ASSUME_KERNEL=2.4.1 as suggested in 
> $the_other_bug, so far no problems, but as this sched_yield()-problem 
> needs some time to show, I don't know, if this really is the solution or 
> if I am just lucky right now.

Did the problem show by this time or does LD_ASSUME_KERNEL=2.4.1 really
help? If it helps I am thinking about adding it to slapd.init as a work
around...

Greetings

	Torsten
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Sven Hartge <hartge@ds9.argh.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #54 received at 303057@bugs.debian.org (full text, mbox):

From: Sven Hartge <hartge@ds9.argh.org>
To: Torsten Landschoff <torsten@debian.org>
Cc: 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Mon, 11 Apr 2005 11:21:54 +0200 (CEST)
Um 09:24 Uhr am 11.04.05 schrieb Torsten Landschoff:
> On Wed, Apr 06, 2005 at 09:27:19PM +0200, Sven Hartge wrote:
  
>> Right now I am running with LD_ASSUME_KERNEL=2.4.1 as suggested in 
>> $the_other_bug, so far no problems, but as this sched_yield()-problem 
>> needs some time to show, I don't know, if this really is the solution or 
>> if I am just lucky right now.
 
> Did the problem show by this time or does LD_ASSUME_KERNEL=2.4.1 really 
> help? If it helps I am thinking about adding it to slapd.init as a work 
> around...

Nope, didn't help. 

S°

-- 
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/



Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Sven Hartge <hartge@ds9.argh.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #59 received at 303057@bugs.debian.org (full text, mbox):

From: Sven Hartge <hartge@ds9.argh.org>
To: Torsten Landschoff <torsten@debian.org>
Cc: 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Mon, 11 Apr 2005 23:05:21 +0200 (CEST)
Hi.

Could we please raise this bug to at least "important", because every day 
at least one of my 8 replicas goes bottom up with the sched_yield() loop. 
Right now I even consider this bug RC-worthy.

Grüße,
Sven.

-- 
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/



Severity set to `serious'. Request was from t.landschoff@gmx.net (Torsten Landschoff) to control@bugs.debian.org. Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Sven Hartge <hartge@ds9.argh.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #66 received at 303057@bugs.debian.org (full text, mbox):

From: Sven Hartge <hartge@ds9.argh.org>
To: Torsten Landschoff <torsten@debian.org>
Cc: 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Mon, 11 Apr 2005 23:56:23 +0200 (CEST)
Hi.

After the last episode of "killall slapd; db4.2_recover; /etc/init.d/slapd start"
my DB_CONFIG looks like this:

#txn_checkpoint         128     15      1
set_cachesize           0       252428800        0
set_lk_max_objects      100000
set_lk_max_locks        100000
#
set_lk_max_lockers      100000
#
set_lg_regionmax        1048576
set_lg_max              8388608
set_lg_bsize            2097152
set_lg_dir              /var/lib/ldap/logs/
#set_lk_detect DB_LOCK_DEFAULT
set_tmp_dir             /tmp/
#set_flags DB_TXN_NOSYNC
#set_flags DB_TXN_NOT_DURABLE

Note the enormous amount of possible locks, lockers and lockable objects. 
So far, only increasing this amount seems to be the way to circumvent the 
dreaded sched_yield()-loop.

My last change is

  set_lk_max_lockers      100000

which was still default and seems to be to _real_ culprit, as per 
ITS#2030:

http://www.openldap.org/its/index.cgi/Software%20Bugs?id=2030;selectid=2030;usearchives=1
(Note how old this bug report to the OpenLDAP ITS is.)

Right now I am running some stress tests on my systems, but those %$/&%$§ 
bastards never locked up, when I was actively trying to push them over the 
edge.

It would be really helpful, if anybody experiencing those problems could 
check with increased locker-settings (as seen above) if they still have 
that problem.

Grüße,
Sven.

-- 
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/



Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Sven Hartge <hartge@ds9.argh.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #71 received at 303057@bugs.debian.org (full text, mbox):

From: Sven Hartge <hartge@ds9.argh.org>
To: Torsten Landschoff <torsten@debian.org>
Cc: 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Tue, 12 Apr 2005 00:58:12 +0200 (CEST)
Um 23:56 Uhr am 11.04.05 schrieb Sven Hartge:

[Sorry for spamming this bug report, but I _really_ need to get this going 
*fast*.]

> My last change is
> 
>   set_lk_max_lockers      100000
> 
> which was still default and seems to be to _real_ culprit, as per 
> ITS#2030:
> 
> http://www.openldap.org/its/index.cgi/Software%20Bugs?id=2030;selectid=2030;usearchives=1
> (Note how old this bug report to the OpenLDAP ITS is.)

After torturing my setup with different scripts and trying to to rebuild 
the normal workload, I am confident, after having run

  watch -n1 -d "db4.2_stat -c"

for some time in parallel, the sched_yield() loop occurs, because the 
bdb-backend runs out of lockers. At least this is what I get from ITS#2030 
and from various other resources (mostly documentation from Sleepycat).

So I suggest for the package to create a DB_CONFIG with _at least_ 5000 
lockers, locks and lock objects. The default of 1000 is just to low and 
will get exploitet in no time by even a little database.

(Don't close this bug yet, further observation has to happen.)

Grüße,
Sven.


-- 
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/



Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Steve Langasek <vorlon@debian.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #76 received at 303057@bugs.debian.org (full text, mbox):

From: Steve Langasek <vorlon@debian.org>
To: Sven Hartge <hartge@ds9.argh.org>, 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Sat, 16 Apr 2005 02:29:50 -0700
[Message part 1 (text/plain, inline)]
Hi Sven,

On Mon, Apr 11, 2005 at 11:56:23PM +0200, Sven Hartge wrote:

> After the last episode of "killall slapd; db4.2_recover; /etc/init.d/slapd start"
> my DB_CONFIG looks like this:

> #txn_checkpoint         128     15      1
> set_cachesize           0       252428800        0
> set_lk_max_objects      100000
> set_lk_max_locks        100000
> #
> set_lk_max_lockers      100000
> #
> set_lg_regionmax        1048576
> set_lg_max              8388608
> set_lg_bsize            2097152
> set_lg_dir              /var/lib/ldap/logs/
> #set_lk_detect DB_LOCK_DEFAULT
> set_tmp_dir             /tmp/
> #set_flags DB_TXN_NOSYNC
> #set_flags DB_TXN_NOT_DURABLE

> Note the enormous amount of possible locks, lockers and lockable objects. 
> So far, only increasing this amount seems to be the way to circumvent the 
> dreaded sched_yield()-loop.

> My last change is

>   set_lk_max_lockers      100000

> which was still default and seems to be to _real_ culprit, as per 
> ITS#2030:

> http://www.openldap.org/its/index.cgi/Software%20Bugs?id=2030;selectid=2030;usearchives=1
> (Note how old this bug report to the OpenLDAP ITS is.)

The last follow-up in that ITS points to
<http://www.sleepycat.com/docs/ref/lock/max.html>, which gives guidelines
about how to tune the number of available locks and lockers.  Is this what
you did?  Has the database held up under stress after making these changes?

I'm not sure how useful it is to set a fixed number of lockers by default,
since the optimal value depends on usage statistics; but bumping from 1000
to 5000 doesn't seem like it can hurt much.

Thanks,
-- 
Steve Langasek
postmodern programmer
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Sven Hartge <hartge@ds9.argh.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #81 received at 303057@bugs.debian.org (full text, mbox):

From: Sven Hartge <hartge@ds9.argh.org>
To: Steve Langasek <vorlon@debian.org>
Cc: 303057@bugs.debian.org
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop
Date: Sat, 16 Apr 2005 16:08:51 +0200 (CEST)
Um 02:29 Uhr am 16.04.05 schrieb Steve Langasek:

>> #txn_checkpoint         128     15      1
>> set_cachesize           0       252428800        0
>> set_lk_max_objects      100000
>> set_lk_max_locks        100000
>> #
>> set_lk_max_lockers      100000
>> #
>> set_lg_regionmax        1048576
>> set_lg_max              8388608
>> set_lg_bsize            2097152
>> set_lg_dir              /var/lib/ldap/logs/
>> #set_lk_detect DB_LOCK_DEFAULT
>> set_tmp_dir             /tmp/
>> #set_flags DB_TXN_NOSYNC
>> #set_flags DB_TXN_NOT_DURABLE
 
>> Note the enormous amount of possible locks, lockers and lockable objects. 
>> So far, only increasing this amount seems to be the way to circumvent the 
>> dreaded sched_yield()-loop.

>> http://www.openldap.org/its/index.cgi/Software%20Bugs?id=2030;selectid=2030;usearchives=1
>> (Note how old this bug report to the OpenLDAP ITS is.)
 
> The last follow-up in that ITS points to 
> <http://www.sleepycat.com/docs/ref/lock/max.html>, which gives 
> guidelines about how to tune the number of available locks and lockers.  
> Is this what you did?

Correct. But since I desperately needed the databases (after all, this are 
my production LDAP servers), I upped the number to this very high value, 
so they would hold up in an case, without me manualle rebuilding the 
database every 6 to 12 hours.

I am about to lower this to 10000, since 100000 is just to high for my 
workload, consuming to much memory.

> Has the database held up under stress after making these changes?

Yes, after the changes, I experienced no more lookups or database 
corruptions.
 
> I'm not sure how useful it is to set a fixed number of lockers by 
> default, since the optimal value depends on usage statistics; but 
> bumping from 1000 to 5000 doesn't seem like it can hurt much.

1000 seems to low in most cases, at least my experience show this.

But: I still consider it a grave bug for db4.2, if running out of lockers 
corrupts the database. And I consider it a bug in slapd, if it runs into a 
busy-waiting loop, if something inside the database went wrong.

Grüße,
Sven.

-- 
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/



Information forwarded to debian-bugs-dist@lists.debian.org, Torsten Landschoff <torsten@debian.org>:
Bug#303057; Package slapd. Full text and rfc822 format available.

Acknowledgement sent to Samuel Thibault <samuel.thibault@ens-lyon.org>:
Extra info received and forwarded to list. Copy sent to Torsten Landschoff <torsten@debian.org>. Full text and rfc822 format available.

Message #86 received at 303057@bugs.debian.org (full text, mbox):

From: Samuel Thibault <samuel.thibault@ens-lyon.org>
To: 303057@bugs.debian.org
Cc: julien.cristau@ens-lyon.org
Subject: Re: Bug#303057: slapd goes in endless sched_yield() loop
Date: Fri, 29 Apr 2005 23:27:41 +0200
Hi,

It seems we've hit the same bug here: we had to reboot our server twice,
and upon restart, slapd went into just the same sched_yield() loop with
the same backtrace.  Running db4.2_recover worked fine to correct things
while LD_ASSUME_KERNEL=2.4 didn't help. Our system is really far from
being overloaded so I don't think we hit the max_lockers limit. If you
want, I kept a copy of the database, both erroneous and corrected by
db4.2_recover. Julien, there's nothing *that* much sensitive in them,
maybe we can privately post an url holding them ?

Regards,
Samuel Thibault



Tags added: pending Request was from Torsten Landschoff <t.landschoff@gmx.net> to control@bugs.debian.org. Full text and rfc822 format available.

Tags added: pending Request was from Torsten Landschoff <t.landschoff@gmx.net> to control@bugs.debian.org. Full text and rfc822 format available.

Reply sent to Torsten Landschoff <torsten@debian.org>:
You have taken responsibility. Full text and rfc822 format available.

Notification sent to Wolfgang Kohnen <wollie@tzi.de>:
Bug acknowledged by developer. Full text and rfc822 format available.

Message #95 received at 255276-close@bugs.debian.org (full text, mbox):

From: Torsten Landschoff <torsten@debian.org>
To: 255276-close@bugs.debian.org
Subject: Bug#255276: fixed in openldap2.2 2.2.23-6
Date: Sun, 29 May 2005 13:34:57 -0400
Source: openldap2.2
Source-Version: 2.2.23-6

We believe that the bug you reported is fixed in the latest version of
openldap2.2, which is due to be installed in the Debian FTP archive:

ldap-utils_2.2.23-6_i386.deb
  to pool/main/o/openldap2.2/ldap-utils_2.2.23-6_i386.deb
libldap-2.2-7_2.2.23-6_i386.deb
  to pool/main/o/openldap2.2/libldap-2.2-7_2.2.23-6_i386.deb
openldap2.2_2.2.23-6.diff.gz
  to pool/main/o/openldap2.2/openldap2.2_2.2.23-6.diff.gz
openldap2.2_2.2.23-6.dsc
  to pool/main/o/openldap2.2/openldap2.2_2.2.23-6.dsc
slapd_2.2.23-6_i386.deb
  to pool/main/o/openldap2.2/slapd_2.2.23-6_i386.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 255276@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Torsten Landschoff <torsten@debian.org> (supplier of updated openldap2.2 package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Sun, 29 May 2005 18:23:20 +0200
Source: openldap2.2
Binary: slapd ldap-utils libldap-2.2-7
Architecture: source i386
Version: 2.2.23-6
Distribution: unstable
Urgency: low
Maintainer: Torsten Landschoff <torsten@debian.org>
Changed-By: Torsten Landschoff <torsten@debian.org>
Description: 
 ldap-utils - OpenLDAP utilities
 libldap-2.2-7 - OpenLDAP libraries
 slapd      - OpenLDAP server (slapd)
Closes: 255276 303505 306229 308234 310422
Changes: 
 openldap2.2 (2.2.23-6) unstable; urgency=low
 .
   Torsten Landschoff <torsten@debian.org>:
   * debian/po/ja.po: Merge updates from Kenshi Muto (closes: #303505).
   * debian/po/fr.po: Merge updates from Christian Perrier (closes: #306229).
   * debian/slapd.scripts-common: If the user enters the empty value for
     the database dumping directory use the default value. Seems like the
     readline interface does not care about the default value
     (closes: #308234).
   * debian/slapd.postinst: Make sure the debhelper commands are executed
     in all cases (closes: #310422).
   * Merged suggested changes by Eugene Konev to automatically run
     db_recover before starting slapd (closes: #255276).
     + debian/slapd.init: Run db_recover if enabled and available and no
       slapd process running.
     + debian/slapd.default: Add configuration option to disable it.
   * Applied and improved patch by Matthijs Mohlmann to support migration
     from ldbm to bdb backend.
     + debian/slapd.config: Ask if migration is wanted.
     + debian/slapd.postinst: Update configuration from ldbm to bdb if yes.
     + debian/slapd.scripts-common: Implemented some parts in their own
       functions.
   * Add a README.DB_CONFIG.gz and reference it where referring to BDB
     configuration.
   * Update default DB_CONFIG with some senseful values.
 .
   Steve Langasek <vorlon@debian.org>:
   * libraries/libldap_r/Makefile.in: make sure the ximian-connector ntlm
     patch is applied to libldap_r, not just to libldap
   * debian/move_files: make libldap a symlink to libldap_r, as carrying
     two versions of this library around is more trouble than it's worth,
     and can cause glorious segfaults down the line
Files: 
 1b46caee7a3377aff6ab29c3034dde86 1035 net optional openldap2.2_2.2.23-6.dsc
 20983ed8e341b87a04116cd7db075e20 489688 net optional openldap2.2_2.2.23-6.diff.gz
 80f24b17e4700ef5b8763c13f8051d3e 809150 net optional slapd_2.2.23-6_i386.deb
 8e72f04c89b139f48da941138113fa5a 118614 net optional ldap-utils_2.2.23-6_i386.deb
 73b838fd8862e1b6c2fd7e029dc84d47 151250 libs important libldap-2.2-7_2.2.23-6_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFCmfsCdQgHtVUb5EcRAj1rAJwPK6SUSnp1F8D0jy5j4rUUc4CksACfdNCI
gb84g+HfrrjhwJuSVlH0CQg=
=r31z
-----END PGP SIGNATURE-----




Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Apr 25 08:42:40 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.