Debian Bug report logs -
#696503
libpipeline: add generator command support
Reply or subscribe to this bug.
Toggle useless messages
Report forwarded
to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#630799; Package man-db.
(Fri, 17 Jun 2011 13:21:04 GMT) (full text, mbox, link).
Acknowledgement sent
to Loïc Minier <lool@dooz.org>:
New Bug report received and forwarded. Copy sent to Colin Watson <cjwatson@debian.org>.
(Fri, 17 Jun 2011 13:21:05 GMT) (full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
Package: man-db
Version: 2.6.0.2-1
Severity: wishlist
Hey there
I've seen this while installing build-deps in chroots for years:
Building database of manual pages ...
and that's likely because I am too lazy to set man-db/auto-update
properly. Pbuilder offers an optin hook to do this, and I'm sure this
could be done in other software, but it turns out most build software
doesn't bother with this. I checked random buildd logs of qemu and
qemu-linaro in Debian and Ubuntu and found:
Setting up man-db (2.6.0.2-1) ...
Building database of manual pages ...
this is particularly common because debhelper depends on man-db (as
dh_installman calls man it seems); lintian also depends on man-db, but
this is likely less of an issue on buildds.
In the interest of saving buildd time without anyone having to set
man-db/auto-update, I propose that we split the tools and the trigger /
database handling in separate packages so that debhelper/lintian just
depend on the tools, not on the presence of a database.
NB: this is particularly bad on ports architectures where it often
takes minutes to generate the DB for some reason
Cheers,
--
Loïc Minier
Information forwarded
to debian-bugs-dist@lists.debian.org:
Bug#630799; Package man-db.
(Fri, 17 Jun 2011 14:12:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list.
(Fri, 17 Jun 2011 14:12:03 GMT) (full text, mbox, link).
Message #10 received at 630799@bugs.debian.org (full text, mbox, reply):
On Fri, Jun 17, 2011 at 03:19:09PM +0200, Loïc Minier wrote:
> In the interest of saving buildd time without anyone having to set
> man-db/auto-update, I propose that we split the tools and the trigger /
> database handling in separate packages so that debhelper/lintian just
> depend on the tools, not on the presence of a database.
I really don't want to do this. I'd rather optimise mandb.
--
Colin Watson [cjwatson@debian.org]
Information forwarded
to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#630799; Package man-db.
(Fri, 17 Jun 2011 15:09:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Loïc Minier <lool@dooz.org>:
Extra info received and forwarded to list. Copy sent to Colin Watson <cjwatson@debian.org>.
(Fri, 17 Jun 2011 15:09:03 GMT) (full text, mbox, link).
Message #15 received at 630799@bugs.debian.org (full text, mbox, reply):
On Fri, Jun 17, 2011, Colin Watson wrote:
> I really don't want to do this. I'd rather optimise mandb.
Ok; just so that I understand, is this about avoid confusion of the
users, or complexity or...?
I guess we can repurpose this bug to "man-db is too slow on armel/ppc"
or something, which are arches where I've witnessed this.
--
Loïc Minier
Information forwarded
to debian-bugs-dist@lists.debian.org:
Bug#630799; Package man-db.
(Sat, 09 Jul 2011 16:51:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list.
(Sat, 09 Jul 2011 16:51:03 GMT) (full text, mbox, link).
Message #20 received at 630799@bugs.debian.org (full text, mbox, reply):
On Fri, Jun 17, 2011 at 05:06:29PM +0200, Loïc Minier wrote:
> On Fri, Jun 17, 2011, Colin Watson wrote:
> > I really don't want to do this. I'd rather optimise mandb.
>
> Ok; just so that I understand, is this about avoid confusion of the
> users, or complexity or...?
Splitting packages is for life, not just for Christmas. Once I do it,
I'm pretty much stuck with it, or at least some vestige of it, forever.
Thus, I'm reluctant to do it solely for performance reasons which I feel
can be addressed in other ways.
If I exhaust the possibilities for optimising mandb without reaching
acceptable performance, then I'm willing to revisit splitting some tools
out into a separate package.
> I guess we can repurpose this bug to "man-db is too slow on armel/ppc"
> or something, which are arches where I've witnessed this.
Actually, I think I could do a lot better generally. For example,
compare these two operations which have identical output, with hot cache
on a reasonably decent i386 laptop with fast SSD:
<cjwatson@sarantium /usr/share/man>$ time find -type f | xargs cat | zcat >/dev/null
real 0m2.494s
user 0m2.440s
sys 0m0.324s
<cjwatson@sarantium /usr/share/man>$ time find -type f | xargs -n1 zcat >/dev/null
real 1m27.988s
user 0m7.940s
sys 0m16.373s
mandb is currently acting more like the latter than the former (and, for
that matter, has similar runtime). OK, so it isn't actually execing
zcat every time, instead forking and having one of the child processes
run an in-process function which uses zlib, thus saving an execve per
process and all the associated process startup costs, and I seem to
remember that that made a noticeable performance difference; but even
so, simply forking 20000-odd processes (as in my example, which is in a
fairly complete environment with lots of manual pages installed;
probably very much less in a build chroot) isn't cheap.
In fact, strace indicates that mandb is forking on the order of four
processes per page. Just the cost of forking, exiting, and waiting for
that number of processes comes to 23 seconds on my system out of mandb's
total runtime of around 100 seconds, and I strongly suspect that doing
any non-trivial multi-process work like this gives the scheduler trouble
and slows everything down further due to the sheer number of context
switches involved (trashing CPU caches, doing TLB flushes, etc.).
My plan here is to beef up libpipeline so that I can do all of mandb's
work in a single process. In fact, I've had a to-do entry in the code
for some time: "ideally, could there be a facility to execute
non-blocking functions without needing to fork?" These would be
something like coroutines or generators. If I do this in libpipeline,
then the changes in man-db can be very small and wouldn't make the code
much harder to maintain: it would still look like running a pipeline of
processes, except that some of them happen to be non-forking function
calls, much as some of them can currently be function calls executed in
a child process. The called functions would just need to be written
such that they can yield control and be re-entered later rather than
blocking.
If that doesn't speed things up enough, then I can look at having more
things done by passing buffers around rather than reading and writing
over pipes. That breaks some useful abstraction layers, though (less
common compression methods are implemented by calling programs like
bzcat, and I'd rather not have to link directly against lots of
decompression libraries), and I'm not sure that it will be necessary.
My instinct is that I can make a very serious dent in mandb's runtime
without resorting to that.
Cheers,
--
Colin Watson [cjwatson@debian.org]
Bug 630799 cloned as bug 696503
Request was from Colin Watson <cjwatson@debian.org>
to control@bugs.debian.org.
(Fri, 21 Dec 2012 21:15:03 GMT) (full text, mbox, link).
Bug reassigned from package 'man-db' to 'libpipeline'.
Request was from Colin Watson <cjwatson@debian.org>
to control@bugs.debian.org.
(Fri, 21 Dec 2012 21:15:03 GMT) (full text, mbox, link).
No longer marked as found in versions man-db/2.6.0.2-1.
Request was from Colin Watson <cjwatson@debian.org>
to control@bugs.debian.org.
(Fri, 21 Dec 2012 21:15:04 GMT) (full text, mbox, link).
Changed Bug title to 'libpipeline: add generator command support' from 'Splitting tools out to help build-deps install time'
Request was from Colin Watson <cjwatson@debian.org>
to control@bugs.debian.org.
(Fri, 21 Dec 2012 21:15:04 GMT) (full text, mbox, link).
Added indication that bug 696503 blocks 630799
Request was from Colin Watson <cjwatson@debian.org>
to control@bugs.debian.org.
(Fri, 21 Dec 2012 21:15:06 GMT) (full text, mbox, link).
Changed Bug submitter to 'Loïc Minier <lool@dooz.org>' from 'Loïc Minier <lool@dooz.org>'
Request was from Don Armstrong <don@debian.org>
to control@bugs.debian.org.
(Thu, 21 Mar 2013 21:30:26 GMT) (full text, mbox, link).
Added indication that bug 696503 blocks 911019
Request was from Colin Watson <cjwatson@debian.org>
to control@bugs.debian.org.
(Mon, 24 Jan 2022 04:21:03 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debian bug tracking system administrator <owner@bugs.debian.org>.
Last modified:
Fri Jan 12 13:22:40 2024;
Machine Name:
buxtehude
Debian Bug tracking system
Debbugs is free software and licensed under the terms of the GNU
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.