Debian Bug report logs -
#664794
lintian: should we compress some collections (file-info and index)?
Reported by: Niels Thykier <niels@thykier.net>
Date: Tue, 20 Mar 2012 22:03:02 UTC
Severity: wishlist
Found in version lintian/2.5.6
Fixed in version 2.5.7
Done: Niels Thykier <niels@thykier.net>
Bug is archived. No further changes may be made.
Toggle useless messages
Information forwarded
to debian-bugs-dist@lists.debian.org, niels@thykier.net, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian.
(Tue, 20 Mar 2012 22:03:05 GMT) (full text, mbox, link).
Acknowledgement sent
to Niels Thykier <niels@thykier.net>:
Extra info received and forwarded to list. Copy sent to niels@thykier.net, Debian Lintian Maintainers <lintian-maint@debian.org>.
(Tue, 20 Mar 2012 22:03:05 GMT) (full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
Source: lintian
Version: 2.5.6
Severity: wishlist
I have been considering if it would be a good idea to (conditionally?)
compress certain collection files. In some cases they are actually
rather large and I suspect compression will generally be good in such
cases[1]. Admittedly, there are also cases where it gives little to
no size reduction.
Code-wise, we should be able to able this for file-info without any
greater hassle than updating L::Collect::Package and coll/file-info.
There are still some "ad-hoc" index parsers left in coll/*, but it
should be fairly straight forward to fix.
However, there has been people doing things like "grep -r $expression"
on the lab in the past[2] and compression could break some of these.
~Niels
For reference, the size of the lab pool is 16 (13) GB according to
du -sh (--apparent-size)
[1]
$ wc -c < e/eclipse/eclipse_3.7.2-1_source/file-info
4600773
$ gzip --best -c < e/eclipse/eclipse_3.7.2-1_source/file-info | wc -c
277286
$ wc -c < e/eclipse/eclipse_3.7.2-1_source/index
5462164
$ gzip --best -c < e/eclipse/eclipse_3.7.2-1_source/index | wc -c
390669
[2] Such as (but not limited to) the Policy Maintainers :)
http://anonscm.debian.org/gitweb/?p=dbnpolicy/policy.git;a=blob;f=tools/license-count
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian.
(Tue, 20 Mar 2012 22:27:05 GMT) (full text, mbox, link).
Acknowledgement sent
to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>.
(Tue, 20 Mar 2012 22:27:05 GMT) (full text, mbox, link).
Message #10 received at 664794@bugs.debian.org (full text, mbox, reply):
Niels Thykier <niels@thykier.net> writes:
> I have been considering if it would be a good idea to (conditionally?)
> compress certain collection files. In some cases they are actually
> rather large and I suspect compression will generally be good in such
> cases[1]. Admittedly, there are also cases where it gives little to no
> size reduction.
Compressing some stuff is not a bad idea. The indices and file-info
collections seem like the most obvious targets. People doing greps can
switch to zgreps.
I would prefer to never conditionally compress anything; either always
compress it or never compress it. That way, the file names and access
method are always consistent.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian.
(Wed, 21 Mar 2012 08:09:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Niels Thykier <niels@thykier.net>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>.
(Wed, 21 Mar 2012 08:09:03 GMT) (full text, mbox, link).
Message #15 received at 664794@bugs.debian.org (full text, mbox, reply):
On 2012-03-20 23:25, Russ Allbery wrote:
> Niels Thykier <niels@thykier.net> writes:
>
>> I have been considering if it would be a good idea to (conditionally?)
>> compress certain collection files. In some cases they are actually
>> rather large and I suspect compression will generally be good in such
>> cases[1]. Admittedly, there are also cases where it gives little to no
>> size reduction.
>
> Compressing some stuff is not a bad idea. The indices and file-info
> collections seem like the most obvious targets. People doing greps can
> switch to zgreps.
>
True, but it kind of implies that they are aware of changes we make in
the Lab. :)
> I would prefer to never conditionally compress anything; either always
> compress it or never compress it. That way, the file names and access
> method are always consistent.
>
Originally I had thought of reusing _open_data_file (from harness) to
access the file(s). But I do see a point in making the access
consistent (especially for people doing "grep -r" checks).
Though it leaves the question of how to migrate from uncompressed to
compressed. If we do "compressed"-only we have to do a full run (or a
find -name | xargs gzip). I guess that is reasonable to do, we just
need to tell people maintaining lintian.$domain.$tld to do the same.
Alternatively, we can bump the version of these collections and have
Lintian slowly migrate as packages are (re-checked), but that means the
(non-Lintian) access will be inconsistent until all packages have been
re-checked.
~Niels
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian.
(Wed, 21 Mar 2012 16:30:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Russ Allbery <rra@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Lintian Maintainers <lintian-maint@debian.org>.
(Wed, 21 Mar 2012 16:30:03 GMT) (full text, mbox, link).
Message #20 received at 664794@bugs.debian.org (full text, mbox, reply):
Niels Thykier <niels@thykier.net> writes:
> True, but it kind of implies that they are aware of changes we make in
> the Lab. :)
True. Well, we could say something in debian-devel-announce.
> Though it leaves the question of how to migrate from uncompressed to
> compressed. If we do "compressed"-only we have to do a full run (or a
> find -name | xargs gzip). I guess that is reasonable to do, we just
> need to tell people maintaining lintian.$domain.$tld to do the same.
> Alternatively, we can bump the version of these collections and have
> Lintian slowly migrate as packages are (re-checked), but that means the
> (non-Lintian) access will be inconsistent until all packages have been
> re-checked.
Ah, yeah, the migration is an issue. My inclination would be to go with
the latter approach and let things be inconsistent for non-Lintian users
for a while, since inevitably we'll want to do a full archive run to pick
up some new set of tags and then it will all get fixed.
--
Russ Allbery (rra@debian.org) <http://www.eyrie.org/~eagle/>
Information forwarded
to debian-bugs-dist@lists.debian.org, Debian Lintian Maintainers <lintian-maint@debian.org>:
Bug#664794; Package src:lintian.
(Sun, 25 Mar 2012 19:09:04 GMT) (full text, mbox, link).
Message #23 received at 664794@bugs.debian.org (full text, mbox, reply):
On 2012-03-20 23:25, Russ Allbery wrote:
> Niels Thykier <niels@thykier.net> writes:
>
>> I have been considering if it would be a good idea to (conditionally?)
>> compress certain collection files. In some cases they are actually
>> rather large and I suspect compression will generally be good in such
>> cases[1]. Admittedly, there are also cases where it gives little to no
>> size reduction.
>
> Compressing some stuff is not a bad idea. The indices and file-info
> collections seem like the most obvious targets. People doing greps can
> switch to zgreps.
>
> I would prefer to never conditionally compress anything; either always
> compress it or never compress it. That way, the file names and access
> method are always consistent.
>
Okay, I have committed the changes for compressing index + file-info.
As a side-effect, I compressed the control-index as well
(bin-pkg-control) to keep L::Collect side simple.
I had a look at some other candidates and I am thinking that java-info,
copyright-file and md5sums. However, as it is we sometimes just leave
an empty file for these collections (if there is no information etc.).
For copyright-file and java-info this is probably going to be common
case (symlinked u/s/d/$pkg and no jar files respectively).
My personal view is that we could do without the empty files and then
only leave a file if there is any information. It will probably require
some changes to checks (or collections) that access these directly, but
I think we should take that as an oppertunity of improving (the usage
of) L::Collect. :)
~Niels
--
To UNSUBSCRIBE, email to debian-lint-maint-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/4F6F6BCC.1090607@thykier.net
Reply sent
to Niels Thykier <niels@thykier.net>:
You have taken responsibility.
(Wed, 02 Jan 2013 22:36:07 GMT) (full text, mbox, link).
Notification sent
to Niels Thykier <niels@thykier.net>:
Bug acknowledged by developer.
(Wed, 02 Jan 2013 22:36:07 GMT) (full text, mbox, link).
Message #28 received at 664794-done@bugs.debian.org (full text, mbox, reply):
Version: 2.5.7
Started in 2.5.7; additional compressioned happened in some later versions.
On 2012-03-20 23:01, Niels Thykier wrote:
> Source: lintian
> Version: 2.5.6
> Severity: wishlist
>
> [...]
>
> ~Niels
>
> For reference, the size of the lab pool is 16 (13) GB according to
> du -sh (--apparent-size)
>
> [...]
Today, our Laboratory size has dropped to 10 (6.1) GB according to du
-csh (--apparent-size). And this is despite the fact that we now also
process experimental (~10% extra packages). Yummy!
~Niels
Bug archived.
Request was from Debbugs Internal Request <owner@bugs.debian.org>
to internal_control@bugs.debian.org.
(Thu, 31 Jan 2013 07:28:40 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debian bug tracking system administrator <owner@bugs.debian.org>.
Last modified:
Sun Nov 19 12:50:13 2023;
Machine Name:
buxtehude
Debian Bug tracking system
Debbugs is free software and licensed under the terms of the GNU
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.