Debian Bug report logs - #696503
libpipeline: add generator command support

Package: libpipeline; Maintainer for libpipeline is Colin Watson <cjwatson@debian.org>;

Reported by: Loïc Minier <lool@dooz.org>

Date: Fri, 17 Jun 2011 13:21:01 UTC

Severity: wishlist

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#630799; Package man-db. (Fri, 17 Jun 2011 13:21:04 GMT) (full text, mbox, link).


Acknowledgement sent to Loïc Minier <lool@dooz.org>:
New Bug report received and forwarded. Copy sent to Colin Watson <cjwatson@debian.org>. (Fri, 17 Jun 2011 13:21:05 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Loïc Minier <lool@dooz.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Splitting tools out to help build-deps install time
Date: Fri, 17 Jun 2011 15:19:09 +0200
Package: man-db
Version: 2.6.0.2-1
Severity: wishlist

        Hey there

 I've seen this while installing build-deps in chroots for years:
    Building database of manual pages ...

 and that's likely because I am too lazy to set man-db/auto-update
 properly.  Pbuilder offers an optin hook to do this, and I'm sure this
 could be done in other software, but it turns out most build software
 doesn't bother with this.  I checked random buildd logs of qemu and
 qemu-linaro in Debian and Ubuntu and found:
    Setting up man-db (2.6.0.2-1) ...
    Building database of manual pages ...
 this is particularly common because debhelper depends on man-db (as
 dh_installman calls man it seems); lintian also depends on man-db, but
 this is likely less of an issue on buildds.

 In the interest of saving buildd time without anyone having to set
 man-db/auto-update, I propose that we split the tools and the trigger /
 database handling in separate packages so that debhelper/lintian just
 depend on the tools, not on the presence of a database.

 NB: this is particularly bad on ports architectures where it often
 takes minutes to generate the DB for some reason

   Cheers,
-- 
Loïc Minier




Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#630799; Package man-db. (Fri, 17 Jun 2011 14:12:03 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Fri, 17 Jun 2011 14:12:03 GMT) (full text, mbox, link).


Message #10 received at 630799@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Loïc Minier <lool@dooz.org>, 630799@bugs.debian.org
Subject: Re: Bug#630799: Splitting tools out to help build-deps install time
Date: Fri, 17 Jun 2011 15:09:43 +0100
On Fri, Jun 17, 2011 at 03:19:09PM +0200, Loïc Minier wrote:
>  In the interest of saving buildd time without anyone having to set
>  man-db/auto-update, I propose that we split the tools and the trigger /
>  database handling in separate packages so that debhelper/lintian just
>  depend on the tools, not on the presence of a database.

I really don't want to do this.  I'd rather optimise mandb.

-- 
Colin Watson                                       [cjwatson@debian.org]




Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#630799; Package man-db. (Fri, 17 Jun 2011 15:09:03 GMT) (full text, mbox, link).


Acknowledgement sent to Loïc Minier <lool@dooz.org>:
Extra info received and forwarded to list. Copy sent to Colin Watson <cjwatson@debian.org>. (Fri, 17 Jun 2011 15:09:03 GMT) (full text, mbox, link).


Message #15 received at 630799@bugs.debian.org (full text, mbox, reply):

From: Loïc Minier <lool@dooz.org>
To: Colin Watson <cjwatson@debian.org>
Cc: 630799@bugs.debian.org
Subject: Re: Bug#630799: Splitting tools out to help build-deps install time
Date: Fri, 17 Jun 2011 17:06:29 +0200
On Fri, Jun 17, 2011, Colin Watson wrote:
> I really don't want to do this.  I'd rather optimise mandb.

 Ok; just so that I understand, is this about avoid confusion of the
 users, or complexity or...?

 I guess we can repurpose this bug to "man-db is too slow on armel/ppc"
 or something, which are arches where I've witnessed this.

-- 
Loïc Minier




Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#630799; Package man-db. (Sat, 09 Jul 2011 16:51:03 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Sat, 09 Jul 2011 16:51:03 GMT) (full text, mbox, link).


Message #20 received at 630799@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Loïc Minier <lool@dooz.org>, 630799@bugs.debian.org
Subject: Re: Bug#630799: Splitting tools out to help build-deps install time
Date: Sat, 9 Jul 2011 17:47:37 +0100
On Fri, Jun 17, 2011 at 05:06:29PM +0200, Loïc Minier wrote:
> On Fri, Jun 17, 2011, Colin Watson wrote:
> > I really don't want to do this.  I'd rather optimise mandb.
> 
>  Ok; just so that I understand, is this about avoid confusion of the
>  users, or complexity or...?

Splitting packages is for life, not just for Christmas.  Once I do it,
I'm pretty much stuck with it, or at least some vestige of it, forever.
Thus, I'm reluctant to do it solely for performance reasons which I feel
can be addressed in other ways.

If I exhaust the possibilities for optimising mandb without reaching
acceptable performance, then I'm willing to revisit splitting some tools
out into a separate package.

>  I guess we can repurpose this bug to "man-db is too slow on armel/ppc"
>  or something, which are arches where I've witnessed this.

Actually, I think I could do a lot better generally.  For example,
compare these two operations which have identical output, with hot cache
on a reasonably decent i386 laptop with fast SSD:

  <cjwatson@sarantium /usr/share/man>$ time find -type f | xargs cat | zcat >/dev/null
  
  real    0m2.494s
  user    0m2.440s
  sys     0m0.324s

  <cjwatson@sarantium /usr/share/man>$ time find -type f | xargs -n1 zcat >/dev/null
  
  real    1m27.988s
  user    0m7.940s
  sys     0m16.373s

mandb is currently acting more like the latter than the former (and, for
that matter, has similar runtime).  OK, so it isn't actually execing
zcat every time, instead forking and having one of the child processes
run an in-process function which uses zlib, thus saving an execve per
process and all the associated process startup costs, and I seem to
remember that that made a noticeable performance difference; but even
so, simply forking 20000-odd processes (as in my example, which is in a
fairly complete environment with lots of manual pages installed;
probably very much less in a build chroot) isn't cheap.

In fact, strace indicates that mandb is forking on the order of four
processes per page.  Just the cost of forking, exiting, and waiting for
that number of processes comes to 23 seconds on my system out of mandb's
total runtime of around 100 seconds, and I strongly suspect that doing
any non-trivial multi-process work like this gives the scheduler trouble
and slows everything down further due to the sheer number of context
switches involved (trashing CPU caches, doing TLB flushes, etc.).

My plan here is to beef up libpipeline so that I can do all of mandb's
work in a single process.  In fact, I've had a to-do entry in the code
for some time: "ideally, could there be a facility to execute
non-blocking functions without needing to fork?"  These would be
something like coroutines or generators.  If I do this in libpipeline,
then the changes in man-db can be very small and wouldn't make the code
much harder to maintain: it would still look like running a pipeline of
processes, except that some of them happen to be non-forking function
calls, much as some of them can currently be function calls executed in
a child process.  The called functions would just need to be written
such that they can yield control and be re-entered later rather than
blocking.

If that doesn't speed things up enough, then I can look at having more
things done by passing buffers around rather than reading and writing
over pipes.  That breaks some useful abstraction layers, though (less
common compression methods are implemented by calling programs like
bzcat, and I'd rather not have to link directly against lots of
decompression libraries), and I'm not sure that it will be necessary.
My instinct is that I can make a very serious dent in mandb's runtime
without resorting to that.

Cheers,

-- 
Colin Watson                                       [cjwatson@debian.org]




Bug 630799 cloned as bug 696503 Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Fri, 21 Dec 2012 21:15:03 GMT) (full text, mbox, link).


Bug reassigned from package 'man-db' to 'libpipeline'. Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Fri, 21 Dec 2012 21:15:03 GMT) (full text, mbox, link).


No longer marked as found in versions man-db/2.6.0.2-1. Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Fri, 21 Dec 2012 21:15:04 GMT) (full text, mbox, link).


Changed Bug title to 'libpipeline: add generator command support' from 'Splitting tools out to help build-deps install time' Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Fri, 21 Dec 2012 21:15:04 GMT) (full text, mbox, link).


Added indication that bug 696503 blocks 630799 Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Fri, 21 Dec 2012 21:15:06 GMT) (full text, mbox, link).


Changed Bug submitter to 'Loïc Minier <lool@dooz.org>' from 'Loïc Minier <lool@dooz.org>' Request was from Don Armstrong <don@debian.org> to control@bugs.debian.org. (Thu, 21 Mar 2013 21:30:26 GMT) (full text, mbox, link).


Added indication that bug 696503 blocks 911019 Request was from Colin Watson <cjwatson@debian.org> to control@bugs.debian.org. (Mon, 24 Jan 2022 04:21:03 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Jan 12 13:22:40 2024; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.