Debian Bug report logs - #691643
man-db -p's mtime check fails and man-db readds man pages needlessly

version graph

Package: man-db; Maintainer for man-db is Colin Watson <cjwatson@debian.org>; Source for man-db is src:man-db (PTS, buildd, popcon).

Reported by: Kari Pahula <kaol@debian.org>

Date: Sat, 27 Oct 2012 22:27:02 UTC

Severity: normal

Tags: fixed-upstream, upstream

Found in version man-db/2.6.3-1

Fixed in version man-db/2.10.0-1

Done: Colin Watson <cjwatson@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#691643; Package man-db. (Sat, 27 Oct 2012 22:27:04 GMT) (full text, mbox, link).


Acknowledgement sent to Kari Pahula <kaol@debian.org>:
New Bug report received and forwarded. Copy sent to Colin Watson <cjwatson@debian.org>. (Sat, 27 Oct 2012 22:27:04 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Kari Pahula <kaol@debian.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: man-db -p's mtime check fails and man-db readds man pages needlessly
Date: Sun, 28 Oct 2012 01:13:35 +0300
Package: man-db
Version: 2.6.3-1
Severity: normal
Tags: upstream patch

mandb -p (which is run by the dpkg trigger) takes a lot of time to
run.  I started looking at it (again) and noticed something peculiar
(edited a bit for clarity).

# mandb -p | grep "manual pages were added"
0 manual pages were added.
# touch /usr/share/man/man3
# mandb -p | grep "manual pages were added"
1101 manual pages were added.
# touch /usr/share/man/man3
# mandb -p | grep "manual pages were added"
1101 manual pages were added.

As I haven't done anything but update the directory's mtime, I would
have expected the subsequent mandb -p runs add no pages.  I tested
this on a few Debian installs with various uses and saw similar thing
happen with their man directories.

I looked at the code and saw that mandb stores man pages' mtimes to
its database, but there's something off with the code and it gives
false positives on differing mtimes.  I didn't come up with a fix for
that issue, but using man pages' ctimes instead I could do this:

Index: man-db-2.6.3/src/check_mandirs.c
===================================================================
--- man-db-2.6.3.orig/src/check_mandirs.c	2012-10-27 12:00:13.000000000 +0300
+++ man-db-2.6.3/src/check_mandirs.c	2012-10-28 00:31:35.200950213 +0300
@@ -316,12 +316,14 @@
 		free (lg.whatis);
 }
 
-static inline void add_dir_entries (const char *path, char *infile)
+static inline void add_dir_entries (const char *path, char *infile,
+				    time_t last)
 {
 	char *manpage;
 	int len;
 	struct dirent *newdir;
 	DIR *dir;
+	struct stat filestat;
 
 	manpage = appendstr (NULL, path, "/", infile, "/", NULL);
 	len = strlen (manpage);
@@ -344,7 +346,10 @@
 		if (!(*newdir->d_name == '.' && 
 		      strlen (newdir->d_name) < (size_t) 3)) {
 			manpage = appendstr (manpage, newdir->d_name, NULL);
-			test_manfile (manpage, path);
+			if (last)
+				lstat(manpage, &filestat);
+			if (filestat.st_ctime >= last)
+				test_manfile (manpage, path);
 			*(manpage + len) = '\0';
 		}
 		
@@ -508,7 +513,7 @@
 			if (!tty)
 				fprintf (stderr, "\n");
 		}
-		add_dir_entries (path, mandir->d_name);
+		add_dir_entries (path, mandir->d_name, last);
 		MYDBM_CLOSE (dbf);
 		amount++;
 	}


Note that test_manfile calls stat on manpage too and those calls could
be merged but I left that change out of this, for simplicity's sake.

I'm happy to say that this makes the trigger work a lot faster.  What
do you think of my approach to this?


-- System Information:
Debian Release: wheezy/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/6 CPU cores)
Locale: LANG=C, LC_CTYPE=fi_FI.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages man-db depends on:
ii  bsdmainutils           9.0.3
ii  debconf [debconf-2.0]  1.5.46
ii  dpkg                   1.16.9
ii  groff-base             1.21-9
ii  libc6                  2.13-36
ii  libgdbm3               1.8.3-11
ii  libpipeline1           1.2.2-1
ii  zlib1g                 1:1.2.7.dfsg-13

man-db recommends no packages.

Versions of packages man-db suggests:
ii  chromium [www-browser]   22.0.1229.94~r161065-2
ii  elinks [www-browser]     0.12~pre5-8
ii  groff                    1.21-9
ii  iceweasel [www-browser]  10.0.10esr-1
ii  konqueror [www-browser]  4:4.8.4-2
ii  less                     451-1
ii  w3m [www-browser]        0.5.3-8

-- debconf information excluded



Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#691643; Package man-db. (Wed, 31 Oct 2012 19:45:02 GMT) (full text, mbox, link).


Acknowledgement sent to Kari Pahula <kaol@debian.org>:
Extra info received and forwarded to list. Copy sent to Colin Watson <cjwatson@debian.org>. (Wed, 31 Oct 2012 19:45:03 GMT) (full text, mbox, link).


Message #10 received at 691643@bugs.debian.org (full text, mbox, reply):

From: Kari Pahula <kaol@debian.org>
To: 691643@bugs.debian.org
Subject: Revised version of ctime patch
Date: Wed, 31 Oct 2012 21:30:27 +0200
[Message part 1 (text/plain, inline)]
I updated my patch a bit.

There was a problem with storing only mtime's seconds as mandb's last
changed time.  There's a chance that if mandb -p gets called twice
within a second that it won't know to crawl a directory on the second
run.  That didn't matter as much before, since the changed files would
be indexed on the next run, whenever that happened.  But with the
ctime patch, there's a risk that it could never catch some files if
this happened.  I made mandb store nanoseconds to avoid this.
[test_manfile_stat (text/plain, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#691643; Package man-db. (Sat, 07 Dec 2013 13:39:09 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Sat, 07 Dec 2013 13:39:09 GMT) (full text, mbox, link).


Message #15 received at 691643@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: man-db-devel@nongnu.org
Cc: 691643@bugs.debian.org
Subject: Re: [Man-db-devel] Database update profiling
Date: Sat, 7 Dec 2013 13:00:11 +0000
On Fri, Dec 06, 2013 at 04:46:43PM -0500, Francis Giraldeau wrote:
> Le 2013-12-06 01:48, Kari Pahula a écrit :
> > None of that code has yet made its way to mandb.
> 
> It's a good start, let's try make it ready.

For what it's worth, I'm actually slightly less interested in the patch
cleanup.  What I'm more interested in, and the reason I hadn't just gone
ahead and dealt with Kari's patch directly (sorry for not explaining
this!) is a more detailed analysis of Kari's comment in the bug report:
"there's something off with the code and it gives false positives on
differing mtimes".  What exactly is going on here?

I would really be more comfortable continuing to use mtimes if possible;
it is the more appropriate stat field to use, as it describes changes to
the file's contents rather than its metadata.  Using ctimes seems to me
to be a mistake.

Cheers,

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#691643; Package man-db. (Tue, 16 Sep 2014 00:09:04 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Tue, 16 Sep 2014 00:09:05 GMT) (full text, mbox, link).


Message #20 received at 691643@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: man-db-devel@nongnu.org
Cc: Kari Pahula <kaol@debian.org>, 691643@bugs.debian.org
Subject: Re: [Man-db-devel] Database update profiling
Date: Tue, 16 Sep 2014 01:07:59 +0100
On Sat, Dec 07, 2013 at 01:00:11PM +0000, Colin Watson wrote:
> On Fri, Dec 06, 2013 at 04:46:43PM -0500, Francis Giraldeau wrote:
> > Le 2013-12-06 01:48, Kari Pahula a écrit :
> > > None of that code has yet made its way to mandb.
> > 
> > It's a good start, let's try make it ready.
> 
> For what it's worth, I'm actually slightly less interested in the patch
> cleanup.  What I'm more interested in, and the reason I hadn't just gone
> ahead and dealt with Kari's patch directly (sorry for not explaining
> this!) is a more detailed analysis of Kari's comment in the bug report:
> "there's something off with the code and it gives false positives on
> differing mtimes".  What exactly is going on here?
> 
> I would really be more comfortable continuing to use mtimes if possible;
> it is the more appropriate stat field to use, as it describes changes to
> the file's contents rather than its metadata.  Using ctimes seems to me
> to be a mistake.

Kari, would you mind giving current git master a try (see
http://man-db.nongnu.org/development.html)?  I've made some substantial
changes recently which are relevant to all this, in particular switching
everything over to use high-precision timestamps.  The database format
version changes as a result so I'd suggest running this only on test
copies of your manual databases, not on /usr/share/man etc. directly, as
it will be incompatible with your system's man-db programs.

I'm hoping that the general cleanup here will have made your original
bug go away.  If not, I'm still interested in a more detailed analysis
of exactly what is going wrong here.

Thanks,

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#691643; Package man-db. (Tue, 16 Sep 2014 13:54:11 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Tue, 16 Sep 2014 13:54:11 GMT) (full text, mbox, link).


Message #25 received at 691643@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: man-db-devel@nongnu.org
Cc: Kari Pahula <kaol@debian.org>, 691643@bugs.debian.org
Subject: Re: [Man-db-devel] Database update profiling
Date: Tue, 16 Sep 2014 14:50:31 +0100
On Tue, Sep 16, 2014 at 01:07:59AM +0100, Colin Watson wrote:
> I'm hoping that the general cleanup here will have made your original
> bug go away.  If not, I'm still interested in a more detailed analysis
> of exactly what is going wrong here.

Ah, apparently I missed your mail that was sent just to man-db-devel and
not to the Debian bug; sorry.  I'll go over that in more detail then ...

-- 
Colin Watson                                       [cjwatson@debian.org]



Information forwarded to debian-bugs-dist@lists.debian.org, Colin Watson <cjwatson@debian.org>:
Bug#691643; Package man-db. (Wed, 24 Sep 2014 14:39:05 GMT) (full text, mbox, link).


Acknowledgement sent to Kari Pahula <kaol@debian.org>:
Extra info received and forwarded to list. Copy sent to Colin Watson <cjwatson@debian.org>. (Wed, 24 Sep 2014 14:39:05 GMT) (full text, mbox, link).


Message #30 received at 691643@bugs.debian.org (full text, mbox, reply):

From: Kari Pahula <kaol@debian.org>
To: 691643@bugs.debian.org
Cc: control@bugs.debian.org
Subject: Checked with 2.7.0: still readds linked manpages with different mtimes
Date: Wed, 24 Sep 2014 17:35:02 +0300
tags 691643 - patch
thanks

Hi.

I tested this with the new version.

# touch /usr/share/man/man3/
# mandb -p 2>/dev/null |tail -n3
1 man subdirectory contained newer manual pages.
1108 manual pages were added.
0 stray cats were added.
# mandb -p 2>/dev/null |tail -n3
0 man subdirectories contained newer manual pages.
0 manual pages were added.
0 stray cats were added.

Looks like this still happens.  For anyone following this, I wrote
last about this on the mailing list:

http://lists.nongnu.org/archive/html/man-db-devel/2013-12/msg00008.html

The simplest way I found to make the issue go away was to use stat
instead of lstat to get the mtimes, but that ran afoul with the test
suite.  I didn't explore that further.

I wrote a small script to equalize man links' mtimes with the man page
files.  I'm not suggesting it as a solution but if you want to see
what kind of impact this has, then feel free to test it (run mandb -c
afterwards).

#! /bin/bash

set -e

if [ -z "$1" ]; then
    echo "Usage: $0 <path>" >&2
    exit 1
fi

cd "$1"

for f in $(find . -type l); do
    touch -h -m -c "$f" -r "$(readlink "$f")"
done



Removed tag(s) patch. Request was from Kari Pahula <kaol@debian.org> to control@bugs.debian.org. (Wed, 24 Sep 2014 14:39:08 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#691643; Package man-db. (Mon, 31 Jan 2022 02:36:02 GMT) (full text, mbox, link).


Acknowledgement sent to Colin Watson <cjwatson@debian.org>:
Extra info received and forwarded to list. (Mon, 31 Jan 2022 02:36:02 GMT) (full text, mbox, link).


Message #37 received at 691643@bugs.debian.org (full text, mbox, reply):

From: Colin Watson <cjwatson@debian.org>
To: Kari Pahula <kaol@debian.org>, 691643@bugs.debian.org
Subject: Re: Bug#691643: Checked with 2.7.0: still readds linked manpages with different mtimes
Date: Mon, 31 Jan 2022 02:32:59 +0000
Control: tag -1 fixed-upstream

On Wed, Sep 24, 2014 at 05:35:02PM +0300, Kari Pahula wrote:
> Looks like this still happens.  For anyone following this, I wrote
> last about this on the mailing list:
> 
> http://lists.nongnu.org/archive/html/man-db-devel/2013-12/msg00008.html
> 
> The simplest way I found to make the issue go away was to use stat
> instead of lstat to get the mtimes, but that ran afoul with the test
> suite.  I didn't explore that further.

I finally managed to track this down and fix it upstream, prompted by a
question in https://bugs.debian.org/1004557:

  https://gitlab.com/cjwatson/man-db/-/commit/37ab864354c1d0ac09e27d2346a1221bf4628509

This will be part of man-db 2.10.0, which also fixes the performance
problems which were how you got into investigating this in the first
place (see https://bugs.debian.org/1003089).

Sorry for the very long delay!

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]



Added tag(s) fixed-upstream. Request was from Colin Watson <cjwatson@debian.org> to 691643-submit@bugs.debian.org. (Mon, 31 Jan 2022 02:36:02 GMT) (full text, mbox, link).


Reply sent to Colin Watson <cjwatson@debian.org>:
You have taken responsibility. (Fri, 04 Feb 2022 15:51:15 GMT) (full text, mbox, link).


Notification sent to Kari Pahula <kaol@debian.org>:
Bug acknowledged by developer. (Fri, 04 Feb 2022 15:51:15 GMT) (full text, mbox, link).


Message #44 received at 691643-close@bugs.debian.org (full text, mbox, reply):

From: Debian FTP Masters <ftpmaster@ftp-master.debian.org>
To: 691643-close@bugs.debian.org
Subject: Bug#691643: fixed in man-db 2.10.0-1
Date: Fri, 04 Feb 2022 15:50:20 +0000
Source: man-db
Source-Version: 2.10.0-1
Done: Colin Watson <cjwatson@debian.org>

We believe that the bug you reported is fixed in the latest version of
man-db, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 691643@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Colin Watson <cjwatson@debian.org> (supplier of updated man-db package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Fri, 04 Feb 2022 15:30:35 +0000
Source: man-db
Architecture: source
Version: 2.10.0-1
Distribution: unstable
Urgency: medium
Maintainer: Colin Watson <cjwatson@debian.org>
Changed-By: Colin Watson <cjwatson@debian.org>
Closes: 630799 691643 941622 970482 974174 998426 1003089 1004248 1004355
Changes:
 man-db (2.10.0-1) unstable; urgency=medium
 .
   * Simplify some debhelper overrides slightly.
   * debian/upstream/metadata: Update for upstream move to GitLab.
   * Add section 0 to default search list (closes: #1004248).
   * New upstream snapshot:
     - Document MAN_DISABLE_SECCOMP and PIPELINE_DEBUG environment variables
       in man(1) (closes: #941622).
     - Add man-pages(7) reference to man(1) (closes: #974174).
     - lexgrog now produces output in the user's locale (closes: #970482).
     - Downgrade "malformed .lf request" warning to a debug message and
       rephrase it somewhat, since .lf requests can use *roff arithmetic
       expressions and we can't reasonably parse those (closes: #998426).
     - Significantly improve mandb(8) and "man -K" performance in the common
       case where pages are of moderate size and compressed using zlib
       (closes: #630799, #1003089; LP: #1858777).
     - Avoid modifying the database without changing its mtime, which had
       been possible since 2.7.0 if mandb's purge phase found work to do but
       the main phase didn't, and which confused some backup systems into
       reporting possible filesystem corruption (closes: #1004355,
       LP: #1411633).
     - mandb now stores the mtime of link targets as the mtime of their
       corresponding database entries, rather than sometimes storing the
       mtime of the link instead (closes: #691643).
Checksums-Sha1:
 bafe3f80fc81f241084601d2fa86237f1b17bfa0 2418 man-db_2.10.0-1.dsc
 ee3bf8ae326f3e193722ba11a608097dd694bd1f 1888196 man-db_2.10.0.orig.tar.xz
 717137ce1e2319daaab8812b3dc89cb7254055b3 833 man-db_2.10.0.orig.tar.xz.asc
 e564cd5c5ca3451a143f328ee9c117cd2bed6abc 72972 man-db_2.10.0-1.debian.tar.xz
Checksums-Sha256:
 23152ae5925ebb3bbeb17b8e5776a6c45c312d3ca215d11601858d59648e5e84 2418 man-db_2.10.0-1.dsc
 0a8629022f7117dc7fc6473c6fdb14913b24b106059bb056abee87dbd6070c79 1888196 man-db_2.10.0.orig.tar.xz
 01bdd84c2b3f106a31ad9d2c8926ba0ece57241eae0dc6b0cead640eb611543e 833 man-db_2.10.0.orig.tar.xz.asc
 bec718ecc64bb05fa8d2f63f153768919b012cf1614da8734afa07ec164eb63b 72972 man-db_2.10.0-1.debian.tar.xz
Files:
 b96773d05e3a92a32d6d17341afbfccf 2418 doc important man-db_2.10.0-1.dsc
 96009cd422f2e62b01b8c4de0f5691f1 1888196 doc important man-db_2.10.0.orig.tar.xz
 d5c04220af019a6adef3f49c82964d4b 833 doc important man-db_2.10.0.orig.tar.xz.asc
 655c82e59f78ce0d50f3ea781fe871e7 72972 doc important man-db_2.10.0-1.debian.tar.xz

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEErApP8SYRtvzPAcEROTWH2X2GUAsFAmH9Rt4ACgkQOTWH2X2G
UAvmKhAAq0gntVMIt7i8RbrUqlJBtpwAvA7TqkQ44464isyoG2TM+yP8yPDlwp/7
oTBbZtmTzE6jK/7SDKHV+P/3Djh6mDSM9cKS62HU1MVKEPmybpqYFcfWFO79T68S
1pCNHHkff4lVMAFM370BmmZlhpLCMZFidbXsqzJQuB3NxutySmtI/MY/wy1wCSUq
0PB0v4YRPA5CKeWjo5LTCbGE9DvlUNVvvSA+HX0p3a1Go2aTfO/7cki4mtpn+P6V
e2VmEt9djAjfzEI9K2fMcyiJE6RhzpA+g2ZX4plo7Mm6kmx+XFTC2EV0yMDdVILe
ctncWpGFd6oROjAz3X0DH23a22l6UWdHQIsIuqq3bD5tZXKcXrcSPJzSdyO0YeBw
+AecHbsMYLtmG9N3KrF+H77EGTSQr1xdqkOimNkx5VdMq/LkDVxjQboRP/T1309Z
amYynIEQ+UwWYoGJSQ031RyfyysgLibPvvH0y/bg019zxux5WQNTqRsFLy092MZN
d6qdjJFVIb2RkwwYGGnNKXLXrBRVeKSl/Zv/KIrsxDLsSUnwV7Tmb/2z4Tz3ODCH
4cAIpdWz/9dSO8OhSvVMNHKGrGtNzL6+LFQ720P+qpZLaEchEdcEFZgvvt2+/zYw
KVCUn8QBkp/p3L3dFMmuvVTTANIQlDPkEqZd/VzPga2x/pTwyo0=
=v64e
-----END PGP SIGNATURE-----




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Sat, 05 Mar 2022 07:25:48 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Jan 12 13:22:38 2024; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.