Debian Bug report logs - #664794
lintian: should we compress some collections (file-info and index)?

version graph

Package: src:lintian; Maintainer for src:lintian is Debian Lintian Maintainers <lintian-maint@debian.org>;

Reported by: Niels Thykier <niels@thykier.net>

Date: Tue, 20 Mar 2012 22:03:02 UTC

Severity: wishlist

Found in version lintian/2.5.6

Fixed in version 2.5.7

Done: Niels Thykier <niels@thykier.net>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Information forwarded to debian-bugs-dist@lists.debian.org, niels@thykier.net, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian. (Tue, 20 Mar 2012 22:03:05 GMT) (full text, mbox, link).


Acknowledgement sent to Niels Thykier <niels@thykier.net>:
Extra info received and forwarded to list. Copy sent to niels@thykier.net, Debian Lintian Maintainers <lintian-maint@debian.org>. (Tue, 20 Mar 2012 22:03:05 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Niels Thykier <niels@thykier.net>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: lintian: should we compress some collections (file-info and index)?
Date: Tue, 20 Mar 2012 23:01:29 +0100
Source: lintian
Version: 2.5.6
Severity: wishlist

I have been considering if it would be a good idea to (conditionally?)
compress certain collection files.  In some cases they are actually
rather large and I suspect compression will generally be good in such
cases[1].  Admittedly, there are also cases where it gives little to
no size reduction.

Code-wise, we should be able to able this for file-info without any
greater hassle than updating L::Collect::Package and coll/file-info.
There are still some "ad-hoc" index parsers left in coll/*, but it
should be fairly straight forward to fix.

However, there has been people doing things like "grep -r $expression"
on the lab in the past[2] and compression could break some of these.

~Niels

For reference, the size of the lab pool is 16 (13) GB according to
 du -sh (--apparent-size)


[1]

$ wc -c < e/eclipse/eclipse_3.7.2-1_source/file-info
4600773
$ gzip --best -c < e/eclipse/eclipse_3.7.2-1_source/file-info  | wc -c
277286
$ wc -c < e/eclipse/eclipse_3.7.2-1_source/index
5462164
$ gzip --best -c < e/eclipse/eclipse_3.7.2-1_source/index  | wc -c
390669


[2] Such as (but not limited to) the Policy Maintainers :)

http://anonscm.debian.org/gitweb/?p=dbnpolicy/policy.git;a=blob;f=tools/license-count




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian. (Tue, 20 Mar 2012 22:27:05 GMT) (full text, mbox, link).


Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Tue, 20 Mar 2012 22:27:05 GMT) (full text, mbox, link).


Message #10 received at 664794@bugs.debian.org (full text, mbox, reply):

From: Russ Allbery <rra@debian.org>
To: Niels Thykier <niels@thykier.net>
Cc: 664794@bugs.debian.org
Subject: Re: Bug#664794: lintian: should we compress some collections (file-info and index)?
Date: Tue, 20 Mar 2012 15:25:41 -0700
Niels Thykier <niels@thykier.net> writes:

> I have been considering if it would be a good idea to (conditionally?)
> compress certain collection files.  In some cases they are actually
> rather large and I suspect compression will generally be good in such
> cases[1].  Admittedly, there are also cases where it gives little to no
> size reduction.

Compressing some stuff is not a bad idea.  The indices and file-info
collections seem like the most obvious targets.  People doing greps can
switch to zgreps.

I would prefer to never conditionally compress anything; either always
compress it or never compress it.  That way, the file names and access
method are always consistent.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian. (Wed, 21 Mar 2012 08:09:03 GMT) (full text, mbox, link).


Acknowledgement sent to Niels Thykier <niels@thykier.net>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Wed, 21 Mar 2012 08:09:03 GMT) (full text, mbox, link).


Message #15 received at 664794@bugs.debian.org (full text, mbox, reply):

From: Niels Thykier <niels@thykier.net>
To: 664794@bugs.debian.org
Subject: Re: Bug#664794: lintian: should we compress some collections (file-info and index)?
Date: Wed, 21 Mar 2012 09:04:49 +0100
On 2012-03-20 23:25, Russ Allbery wrote:
> Niels Thykier <niels@thykier.net> writes:
> 
>> I have been considering if it would be a good idea to (conditionally?)
>> compress certain collection files.  In some cases they are actually
>> rather large and I suspect compression will generally be good in such
>> cases[1].  Admittedly, there are also cases where it gives little to no
>> size reduction.
> 
> Compressing some stuff is not a bad idea.  The indices and file-info
> collections seem like the most obvious targets.  People doing greps can
> switch to zgreps.
> 

True, but it kind of implies that they are aware of changes we make in
the Lab. :)

> I would prefer to never conditionally compress anything; either always
> compress it or never compress it.  That way, the file names and access
> method are always consistent.
> 

Originally I had thought of reusing _open_data_file (from harness) to
access the file(s).  But I do see a point in making the access
consistent (especially for people doing "grep -r" checks).

Though it leaves the question of how to migrate from uncompressed to
compressed.  If we do "compressed"-only we have to do a full run (or a
find -name | xargs gzip).  I guess that is reasonable to do, we just
need to tell people maintaining lintian.$domain.$tld to do the same.
  Alternatively, we can bump the version of these collections and have
Lintian slowly migrate as packages are (re-checked), but that means the
(non-Lintian) access will be inconsistent until all packages have been
re-checked.

~Niels





Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian. (Wed, 21 Mar 2012 16:30:03 GMT) (full text, mbox, link).


Acknowledgement sent to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>. (Wed, 21 Mar 2012 16:30:03 GMT) (full text, mbox, link).


Message #20 received at 664794@bugs.debian.org (full text, mbox, reply):

From: Russ Allbery <rra@debian.org>
To: 664794@bugs.debian.org
Subject: Re: Bug#664794: lintian: should we compress some collections (file-info and index)?
Date: Wed, 21 Mar 2012 09:28:06 -0700
Niels Thykier <niels@thykier.net> writes:

> True, but it kind of implies that they are aware of changes we make in
> the Lab. :)

True.  Well, we could say something in debian-devel-announce.

> Though it leaves the question of how to migrate from uncompressed to
> compressed.  If we do "compressed"-only we have to do a full run (or a
> find -name | xargs gzip).  I guess that is reasonable to do, we just
> need to tell people maintaining lintian.$domain.$tld to do the same.
>   Alternatively, we can bump the version of these collections and have
> Lintian slowly migrate as packages are (re-checked), but that means the
> (non-Lintian) access will be inconsistent until all packages have been
> re-checked.

Ah, yeah, the migration is an issue.  My inclination would be to go with
the latter approach and let things be inconsistent for non-Lintian users
for a while, since inevitably we'll want to do a full archive run to pick
up some new set of tags and then it will all get fixed.

-- 
Russ Allbery (rra@debian.org)               <http://www.eyrie.org/~eagle/>




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian. (Sun, 25 Mar 2012 19:09:04 GMT) (full text, mbox, link).


Message #23 received at 664794@bugs.debian.org (full text, mbox, reply):

From: Niels Thykier <niels@thykier.net>
To: debian-lint-maint@lists.debian.org
Subject: Re: Bug#664794: lintian: should we compress some collections (file-info and index)?
Date: Sun, 25 Mar 2012 21:02:36 +0200
On 2012-03-20 23:25, Russ Allbery wrote:
> Niels Thykier <niels@thykier.net> writes:
> 
>> I have been considering if it would be a good idea to (conditionally?)
>> compress certain collection files.  In some cases they are actually
>> rather large and I suspect compression will generally be good in such
>> cases[1].  Admittedly, there are also cases where it gives little to no
>> size reduction.
> 
> Compressing some stuff is not a bad idea.  The indices and file-info
> collections seem like the most obvious targets.  People doing greps can
> switch to zgreps.
> 
> I would prefer to never conditionally compress anything; either always
> compress it or never compress it.  That way, the file names and access
> method are always consistent.
> 

Okay, I have committed the changes for compressing index + file-info.
As a side-effect, I compressed the control-index as well
(bin-pkg-control) to keep L::Collect side simple.

I had a look at some other candidates and I am thinking that java-info,
copyright-file and md5sums.  However, as it is we sometimes just leave
an empty file for these collections (if there is no information etc.).
  For copyright-file and java-info this is probably going to be common
case (symlinked u/s/d/$pkg and no jar files respectively).

My personal view is that we could do without the empty files and then
only leave a file if there is any information.  It will probably require
some changes to checks (or collections) that access these directly, but
I think we should take that as an oppertunity of improving (the usage
of) L::Collect. :)

~Niels



-- 
To UNSUBSCRIBE, email to debian-lint-maint-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/4F6F6BCC.1090607@thykier.net





Reply sent to Niels Thykier <niels@thykier.net>:
You have taken responsibility. (Wed, 02 Jan 2013 22:36:07 GMT) (full text, mbox, link).


Notification sent to Niels Thykier <niels@thykier.net>:
Bug acknowledged by developer. (Wed, 02 Jan 2013 22:36:07 GMT) (full text, mbox, link).


Message #28 received at 664794-done@bugs.debian.org (full text, mbox, reply):

From: Niels Thykier <niels@thykier.net>
To: Niels Thykier <niels@thykier.net>, 664794-done@bugs.debian.org
Subject: Re: Bug#664794: lintian: should we compress some collections (file-info and index)?
Date: Wed, 02 Jan 2013 23:34:10 +0100
Version: 2.5.7

Started in 2.5.7; additional compressioned happened in some later versions.

On 2012-03-20 23:01, Niels Thykier wrote:
> Source: lintian
> Version: 2.5.6
> Severity: wishlist
> 
> [...]
> 
> ~Niels
> 
> For reference, the size of the lab pool is 16 (13) GB according to
>  du -sh (--apparent-size)
> 
> [...]

Today, our Laboratory size has dropped to 10 (6.1) GB according to du
-csh (--apparent-size).  And this is despite the fact that we now also
process experimental (~10% extra packages).  Yummy!

~Niels




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 31 Jan 2013 07:28:40 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Nov 19 12:50:13 2023; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.