Debian Bug report logs - #647522
gzip -9n is not deterministic

version graph

Package: gzip; Maintainer for gzip is Bdale Garbee <bdale@gag.com>; Source for gzip is src:gzip.

Reported by: Jakub Wilk <jwilk@debian.org>

Date: Thu, 3 Nov 2011 14:48:02 UTC

Severity: normal

Found in version gzip/1.4-1

Fixed in version gzip/1.4-5

Done: Bdale Garbee <bdale@gag.com>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, jwilk@debian.org, vorlon@debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Thu, 03 Nov 2011 14:48:05 GMT) Full text and rfc822 format available.

Message #3 received at submit@bugs.debian.org (full text, mbox):

From: Jakub Wilk <jwilk@debian.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: gzip -9n is not deterministic
Date: Thu, 3 Nov 2011 15:45:25 +0100
[Message part 1 (text/plain, inline)]
Package: gzip
Version: 1.4-1
Severity: normal

While doing some QA work on Multi-Arch:same packages, we noticed that 
gzip output is not always deterministic, even when using -n. The 
attached examples were all produced on buildds[0], using gzip 1.4-1 with 
-9n:

$ md5sum *.gz
85d8367f827c0b92872854b2aeb80643  libxv-dev_1.0.6-2_i386_XvQueryExtension.3.gz
46d600750554b57ebb7eba06df10497b  libxv-dev_1.0.6-2_sparc_XvQueryExtension.3.gz
a6306103e0605a4513787729712cfcf4  pam_get_user_libpam0g-dev_1.1.3-5_i386.gz
7ab04b549fad059131df5af6bb8d9cd0  pam_get_user_libpam0g-dev_1.1.3-5_mipsel.gz
bb372b987823b13a7a93a5db8790341a  pam_get_user_libpam0g-dev_1.1.3-5_s390.gz

$ for gz in *.gz; do gzip -dc < $gz | gzip -9n > ${gz%.gz}.regz; done

$ md5sum *.regz
85d8367f827c0b92872854b2aeb80643  libxv-dev_1.0.6-2_i386_XvQueryExtension.3.regz
85d8367f827c0b92872854b2aeb80643  libxv-dev_1.0.6-2_sparc_XvQueryExtension.3.regz
a6306103e0605a4513787729712cfcf4  pam_get_user_libpam0g-dev_1.1.3-5_i386.regz
a6306103e0605a4513787729712cfcf4  pam_get_user_libpam0g-dev_1.1.3-5_mipsel.regz
a6306103e0605a4513787729712cfcf4  pam_get_user_libpam0g-dev_1.1.3-5_s390.regz

The bug seem to trigger most often on amd64 and ia64, though sometimes 
it pops up also on other architectures.

It was originally reported in Ubuntu: 
https://bugs.launchpad.net/ubuntu/+source/pam/+bug/871083

[0] 
https://buildd.debian.org/status/fetch.php?pkg=libxv&arch=i386&ver=2%3A1.0.6-2&stamp=1313087510
https://buildd.debian.org/status/fetch.php?pkg=libxv&arch=sparc&ver=2%3A1.0.6-2&stamp=1313090905
https://buildd.debian.org/status/fetch.php?pkg=pam&arch=i386&ver=1.1.3-5&stamp=1319817965
https://buildd.debian.org/status/fetch.php?pkg=pam&arch=mipsel&ver=1.1.3-5&stamp=1319818484
https://buildd.debian.org/status/fetch.php?pkg=pam&arch=s390&ver=1.1.3-5&stamp=1319820034

-- 
Jakub Wilk
[libxv-dev_1.0.6-2_i386_XvQueryExtension.3.gz (application/octet-stream, attachment)]
[libxv-dev_1.0.6-2_sparc_XvQueryExtension.3.gz (application/octet-stream, attachment)]
[pam_get_user_libpam0g-dev_1.1.3-5_i386.gz (application/octet-stream, attachment)]
[pam_get_user_libpam0g-dev_1.1.3-5_mipsel.gz (application/octet-stream, attachment)]
[pam_get_user_libpam0g-dev_1.1.3-5_s390.gz (application/octet-stream, attachment)]

Added indication that bug 647522 blocks 647283 Request was from Jakub Wilk <jwilk@debian.org> to control@bugs.debian.org. (Thu, 03 Nov 2011 15:42:10 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Thu, 24 Nov 2011 10:21:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Neuffer <neuffer@neuffer.info>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Thu, 24 Nov 2011 10:21:09 GMT) Full text and rfc822 format available.

Message #10 received at 647522@bugs.debian.org (full text, mbox):

From: Michael Neuffer <neuffer@neuffer.info>
To: 647522@bugs.debian.org
Subject: This bug should be upgraded to critical
Date: Thu, 24 Nov 2011 11:13:05 +0100
as it pops up all over the place affecting many packages in unstable 
rendering the upgrade of multiarch installations next to impossible
as it is beeing turned into a major minefield.








Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Thu, 08 Dec 2011 10:30:15 GMT) Full text and rfc822 format available.

Acknowledgement sent to Riku Voipio <riku.voipio@iki.fi>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Thu, 08 Dec 2011 10:30:33 GMT) Full text and rfc822 format available.

Message #15 received at 647522@bugs.debian.org (full text, mbox):

From: Riku Voipio <riku.voipio@iki.fi>
To: bug-gzip@gnu.org
Cc: 647522@bugs.debian.org
Subject: non-deterministic compression results with gzip -n9
Date: Thu, 8 Dec 2011 12:27:38 +0200
[Message part 1 (text/plain, inline)]
Hi (please CC:, not on list),

It has been observed in Debian and Ubuntu that the results of gzip -n9
are not deterministic. This appears to be an heisenbug, that is not usually
reproducible. The attached two files decompress to the same file. Both files
were created on ubuntu with gzip 1.3.12-9ubuntu1.1, but similar behaviour
has been observed with gzip 1.4 on ubuntu and debian. Bigger file was created
on a x86_64 system, but on my x86_64 system I haven't been able to reproduce
it. The diffences appear on the last 10 bytes:

$ cmp -l ChangeLog.pre-2-2.amd64.gz ChangeLog.pre-2-2.armel.gz
17375 126 122
17376 327  13
17377 327 377
17378 377  37
17379  37 155
17380 155  57
17381  57 155
17382 155  56
17383  56 324
17384 324 301
17385 301   0
cmp: EOF on ChangeLog.pre-2-2.armel.gz

The other sample is different only on the last 3 bytes:

$ cmp -l NEWS.amd64.gz NEWS.armel.gz
5110 207 367
5111  56 120
5113  13  27

According to gzip RFC, the last 4 bytes are ISIZE, which should be
uncompressed input size. Which leaves me rather baffled how that can
differ on same input files - and how gunzip is completly happy with
both versions of compressed file, producing the same output.

Any help debugging deeper would be appreceated.
Riku

RFC:

http://www.gzip.org/zlib/rfc-gzip.html

Original bugreport:

https://bugs.launchpad.net/ubuntu/+source/gzip/+bug/871083
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=647522
[ChangeLog.pre-2-2.amd64.gz (application/octet-stream, attachment)]
[ChangeLog.pre-2-2.armel.gz (application/octet-stream, attachment)]
[NEWS.amd64.gz (application/octet-stream, attachment)]
[NEWS.armel.gz (application/octet-stream, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Fri, 09 Dec 2011 20:24:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert@cs.ucla.edu>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Fri, 09 Dec 2011 20:24:07 GMT) Full text and rfc822 format available.

Message #20 received at 647522@bugs.debian.org (full text, mbox):

From: Paul Eggert <eggert@cs.ucla.edu>
To: Riku Voipio <riku.voipio@iki.fi>
Cc: bug-gzip@gnu.org, 647522@bugs.debian.org
Subject: Re: non-deterministic compression results with gzip -n9
Date: Fri, 09 Dec 2011 12:16:21 -0800
I should add that it's OK (from the point of view of
the RFCs) if gzip produces different outputs given the same
inputs when compressing.  The RFCs allow that and presumably
other gzip implementations do that.  All that's required is
that compress+decompress result in a copy of the original.

That being said, it's nicer if gzip is deterministic and it'd
be helpful to to get to the bottom of this and make
the nondeterminism go away in future versions of gzip.




Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Fri, 09 Dec 2011 20:24:09 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert@cs.ucla.edu>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Fri, 09 Dec 2011 20:24:09 GMT) Full text and rfc822 format available.

Message #25 received at 647522@bugs.debian.org (full text, mbox):

From: Paul Eggert <eggert@cs.ucla.edu>
To: Riku Voipio <riku.voipio@iki.fi>
Cc: bug-gzip@gnu.org, 647522@bugs.debian.org
Subject: Re: non-deterministic compression results with gzip -n9
Date: Fri, 09 Dec 2011 12:10:07 -0800
On 12/08/11 02:27, Riku Voipio wrote:
> According to gzip RFC, the last 4 bytes are ISIZE, which should be
> uncompressed input size. Which leaves me rather baffled how that can
> differ on same input files - and how gunzip is completly happy with
> both versions of compressed file, producing the same output.

I looked only at NEWS.amd64.gz and NEWS.armel.gz.  For those two files
your diagnosis does not seem to be right, as these two files do not
differ in the last 4 bytes, but in bytes before then:

$ od -tx1 <NEWS.amd64.gz >NEWS.amd64.gz.od
$ od -tx1 <NEWS.armel.gz >NEWS.armel.gz.od
$ diff -u NEWS.*.gz.od
--- NEWS.amd64.gz.od	2011-12-09 12:03:41.090594754 -0800
+++ NEWS.armel.gz.od	2011-12-09 12:03:57.298663371 -0800
@@ -317,6 +317,6 @@
 0011700 fa 9f da 9b 92 ad 57 44 19 45 c5 42 e5 b6 d9 c2
 0011720 7e 80 02 bd 58 94 33 74 ba 0a 62 24 52 7b 35 33
 0011740 b2 87 51 76 b7 af cc 7f 09 b0 2d 14 d6 8d f9 4d
-0011760 94 51 39 49 cd 87 2e ff 0b 5a 6c 8d f6 80 35 00
+0011760 94 51 39 49 cd f7 50 ff 17 5a 6c 8d f6 80 35 00
 0012000 00
 0012001

So the differences are not in ISIZE.  Can you track down
what's actually differing and why?  (That would save some
time for me when debugging....)  Thanks.




Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Mon, 06 Feb 2012 20:51:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Neil Williams <codehelp@debian.org>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Mon, 06 Feb 2012 20:51:03 GMT) Full text and rfc822 format available.

Message #30 received at 647522@bugs.debian.org (full text, mbox):

From: Neil Williams <codehelp@debian.org>
To: 647522@bugs.debian.org, bug-gzip@gnu.org
Cc: Riku Voipio <riku.voipio@iki.fi>, Paul Eggert <eggert@cs.ucla.edu>
Subject: non-deterministic compression for CREDITS.gz in libppl9 amd64 & armel
Date: Mon, 6 Feb 2012 20:49:12 +0000
[Message part 1 (text/plain, inline)]
$ dget http://ftp.uk.debian.org/debian/pool/main/p/ppl/libppl9_0.11.2-6_armel.deb
$ dpkg -X libppl9_0.11.2-6_armel.deb .
$ cp ./usr/share/doc/libppl9/CREDITS.gz .

$ md5sum CREDITS.gz 
0e52e84eebf41588865742edaff7b3c0  CREDITS.gz
$ gunzip CREDITS.gz 
$ gzip -9nf CREDITS 
$ md5sum CREDITS.gz 
99e2b9f8972ce00cfe57e3735881015e  CREDITS.gz

This test was done on abel.debian.org - an armel machine using gzip
1.3.12-9 but the original armel package was built using gzip 1.4-1 on
the buildd.

(This md5sum matches that of the same file in the amd64 package.)
http://ftp.uk.debian.org/debian/pool/main/p/ppl/libppl9_0.11.2-6_amd64.deb
99e2b9f8972ce00cfe57e3735881015e  usr/share/doc/libppl9/CREDITS.gz

$ od -tx1 < CREDITS.gz > CREDITS-redone.gz.od
$ od -tx1 < ./usr/share/doc/libppl9/CREDITS.gz > CREDITS-original.gz.od
$ diff -u CREDITS-original.gz.od CREDITS-redone.gz.od
--- CREDITS-original.gz.od	2012-02-06 20:34:43.000000000 +0000
+++ CREDITS-redone.gz.od	2012-02-06 20:34:29.000000000 +0000
@@ -393,6 +393,6 @@
 0014200 16 78 3d a3 79 3d 1c f0 b0 c2 5f e9 f6 0b 5b 4c
 0014220 77 8b 91 89 1d 13 b7 58 16 f3 5b 10 1e 20 d1 f3
 0014240 d3 44 79 f2 05 9a 9c e7 87 42 b9 b5 34 42 56 55
-0014260 95 a1 bb 55 ec 78 cb f2 7f ba 11 41 f7 b3 d4 0f
-0014300 d6 b0 a7 11 7b 4c 00 00
-0014310
+0014260 95 a1 bb 55 ec 78 cb f2 7f ba 11 41 f7 ea 3f d6
+0014300 b0 a7 11 7b 4c 00 00
+0014307

i.e. the armel package contains the anomalous file but it can be
converted to the same file as the amd64 package by redoing the
compression.

The manifestation of this bug is clear when trying to install the
MultiArch build-dependencies for cross-compilers:

$ sudo apt-get install libcloog-ppl-dev:armel

Selecting previously unselected package libppl9:armel.
(Reading database ... 167711 files and directories currently installed.)
Unpacking libppl9:armel (from .../libppl9_0.11.2-6_armel.deb) ...
dpkg: error processing /var/cache/apt/archives/libppl9_0.11.2-6_armel.deb (--unpack):
 './usr/share/doc/libppl9/CREDITS.gz' is different from the same file on the system


-- 


Neil Williams
=============
http://www.linux.codehelp.co.uk/

[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Mon, 06 Feb 2012 22:33:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert@cs.ucla.edu>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Mon, 06 Feb 2012 22:33:03 GMT) Full text and rfc822 format available.

Message #35 received at 647522@bugs.debian.org (full text, mbox):

From: Paul Eggert <eggert@cs.ucla.edu>
To: Neil Williams <codehelp@debian.org>
Cc: 647522@bugs.debian.org, bug-gzip@gnu.org, Riku Voipio <riku.voipio@iki.fi>
Subject: Re: non-deterministic compression for CREDITS.gz in libppl9 amd64 & armel
Date: Mon, 06 Feb 2012 14:21:15 -0800
I can't reproduce the problem on x86-64 with vanilla
gzip 1.4 and vanilla gzip 1.3.12.  So the problem appears to be
either architecture-dependent, or it's a property of
the Debian patches to gzip, or something like that, and
I expect we'll need more information about how to
reproduce the problem.  It looks like the problem is with
1.3.12-9 on armel so you might want to focus your attention
there.




Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Tue, 07 Feb 2012 00:06:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Neil Williams <codehelp@debian.org>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Tue, 07 Feb 2012 00:06:04 GMT) Full text and rfc822 format available.

Message #40 received at 647522@bugs.debian.org (full text, mbox):

From: Neil Williams <codehelp@debian.org>
To: eggert@cs.ucla.edu
Cc: 647522@bugs.debian.org, bug-gzip@gnu.org, riku.voipio@iki.fi
Subject: Re: non-deterministic compression for CREDITS.gz in libppl9 amd64 & kfreebsd-amd64
Date: Tue, 7 Feb 2012 00:02:46 +0000
[Message part 1 (text/plain, inline)]
On Mon, 06 Feb 2012 14:21:15 -0800
Paul Eggert <eggert@cs.ucla.edu> wrote:

> I can't reproduce the problem on x86-64 with vanilla
> gzip 1.4 and vanilla gzip 1.3.12.  So the problem appears to be
> either architecture-dependent, or it's a property of
> the Debian patches to gzip, or something like that, and
> I expect we'll need more information about how to
> reproduce the problem.  It looks like the problem is with
> 1.3.12-9 on armel so you might want to focus your attention
> there.

The broken CREDITS.gz was created with gzip 1.4 from Debian unstable. I
happened to use 1.3.12 to test on a different armel machine but the
whole problem with this bug is that it is non-deterministic and simply
repeating the compression can "fix" the apparent problem.

I added the extra information because the two versions of CREDITS.gz
are available via the packages specified, so rather than having to rely
on my own debug information, there is the opportunity to view/analyse
the actual .gz files involved in a situation where the checksums can be
checked and validated and the build logs exist so that the actual
version of gzip installed can be checked too.

gzip: already installed (1.4-1)
https://buildd.debian.org/status/fetch.php?pkg=ppl&arch=armel&ver=0.11.2-6&stamp=1318428664

For comparison, the i386 build used the same version of gzip on the
same file and gave a different .gz file:
i386:
99e2b9f8972ce00cfe57e3735881015e  usr/share/doc/libppl9/CREDITS.gz
armel:
0e52e84eebf41588865742edaff7b3c0  usr/share/doc/libppl9/CREDITS.gz

i386 log:
https://buildd.debian.org/status/fetch.php?pkg=ppl&arch=i386&ver=0.11.2-6&stamp=1318344010

More examples may well turn up soon as more people install the
MultiArch-aware version of dpkg which allows packages to be alongside
each other. This assumes and requires that files compressed on one
architecture are the same as the same file compressed on a different
architecture. It is quite possible that the bug in gzip is independent
of the architecture itself but that is how all of these issues are going
to show up.

Indeed, a quick check shows that this is not architecture-specific. The
kfreebsd-amd64 log shows that CREDITS.gz is a larger file than
linux-amd64:

https://buildd.debian.org/status/fetch.php?pkg=ppl&arch=kfreebsd-amd64&ver=0.11.2-6&stamp=1318348840

kfreebsd-amd64:
6344 2011-02-27 09:07 ./usr/share/doc/libppl9/CREDITS.gz

linux-amd64:
6343 2011-02-27 09:07 ./usr/share/doc/libppl9/CREDITS.gz

http://ftp.uk.debian.org/debian/pool/main/p/ppl/libppl9_0.11.2-6_kfreebsd-amd64.deb

0e52e84eebf41588865742edaff7b3c0  usr/share/doc/libppl9/CREDITS.gz

Same as armel but different to armhf, i386 and amd64.

I see no reason why a change of kernel or of gcc compiler flags for the
same version of gzip (all 1.4) would cause such non-deterministic
results from using gzip -9n

There is something else going on here, something internal to gzip which
is changing certain bytes inside the compressed file - in the same
manner. It is strange indeed for four separate machines to produce two
matching pairs of the same discrepancy when running the same code.

Ignore my tests with an older version of gzip - these results are all
with gzip 1.4-1. It doesn't matter if I decompress/recompress on amd64
or armel, the discrepancy goes away. The problem is that we cannot
anticipate when the discrepancy will occur, leading to packages failing
to install in random and unpredictable patterns.

This bug is going to be hard to reproduce but the results of it are
neither architecture dependent nor version dependent. Interestingly,
other text files in the same package, compressed on the same machine,
using the same options to gzip, do not differ. It's the peculiar
requirements of MultiArch which have brought this to light and in the
majority of cases the results of gzip -9n on the same file are
identical - but not always.

-- 


Neil Williams
=============
http://www.linux.codehelp.co.uk/

[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Tue, 07 Feb 2012 11:45:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Uli Martens <uli@youam.net>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Tue, 07 Feb 2012 11:45:08 GMT) Full text and rfc822 format available.

Message #45 received at 647522@bugs.debian.org (full text, mbox):

From: Uli Martens <uli@youam.net>
To: 647522@bugs.debian.org
Cc: riku.voipio@iki.fi
Subject: RE: non-deterministic compression results with gzip -n9
Date: Tue, 7 Feb 2012 12:27:34 +0100
[Message part 1 (text/plain, inline)]
Hello there,

using a small tool I wrote a while back, it's possible to see that the
actuall differences are in the encoding of the last few bytes of
cleartext.

I've copied the decoding tool to http://youam.net/devel/rfc1952-dec

Attached is the difference of the decoded files from
<20111208102738.GA7727@afflict.kos.to>.

As to why this happens: no idea (yet)

Uli
[NEWS.diff (text/x-diff, attachment)]
[ChangeLog.pre-2-2.diff (text/x-diff, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Wed, 08 Feb 2012 05:57:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Zack Weinberg <zackw@panix.com>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Wed, 08 Feb 2012 05:57:04 GMT) Full text and rfc822 format available.

Message #50 received at 647522@bugs.debian.org (full text, mbox):

From: Zack Weinberg <zackw@panix.com>
To: 647522@bugs.debian.org
Cc: bug-gzip@gnu.org, Riku Voipio <riku.voipio@iki.fi>
Subject: Re: non-deterministic compression results with gzip -n9
Date: Tue, 7 Feb 2012 21:52:24 -0800
I've seen inexplicable nondeterminism like this before, and quite
often it's turned out to be controlled by the total size of the
command line argument area (that is, argv + environ + ELF auxv).
Changes in how big that is change the initial stack pointer address,
and while that *shouldn't* matter to anything, sometimes it does.

A shell loop of the form

export X=
i=0
while [ i -lt 8192 ]; do
perform test
X=${X}x
i=$((i+1))
done

should catch it if this is the cause.




Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Wed, 08 Feb 2012 12:24:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Cyril Brulebois <kibi@debian.org>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Wed, 08 Feb 2012 12:24:09 GMT) Full text and rfc822 format available.

Message #55 received at 647522@bugs.debian.org (full text, mbox):

From: Cyril Brulebois <kibi@debian.org>
To: Zack Weinberg <zackw@panix.com>, 647522@bugs.debian.org
Cc: bug-gzip@gnu.org, Riku Voipio <riku.voipio@iki.fi>
Subject: Re: Bug#647522: non-deterministic compression results with gzip -n9
Date: Wed, 8 Feb 2012 12:36:10 +0100
[Message part 1 (text/plain, inline)]
Zack Weinberg <zackw@panix.com> (07/02/2012):
> I've seen inexplicable nondeterminism like this before, and quite
> often it's turned out to be controlled by the total size of the
> command line argument area (that is, argv + environ + ELF auxv).

FWIW, a quick look on kfreebsd-amd64 with ppl's CREDITS file led me to:
  gzip -9nf CREDITS → 6343 bytes

running dh_installdocs && dh_compress with DH_VERBOSE, I noticed the
following command line:
  gzip -9nf README CREDITS
and the result → 6344 bytes.

Playing on amd64:
cbrulebois@Cygnus:/tmp/ppl-0.11.2$ cp ../ppl-pristine/{CREDITS,README} .
cbrulebois@Cygnus:/tmp/ppl-0.11.2$ gzip -9nf CREDITS README
cbrulebois@Cygnus:/tmp/ppl-0.11.2$ ls -l *gz
-rw-r--r-- 1 cbrulebois cbrulebois 6343 Feb  8 12:34 CREDITS.gz
-rw-r--r-- 1 cbrulebois cbrulebois 8745 Feb  8 12:34 README.gz
cbrulebois@Cygnus:/tmp/ppl-0.11.2$ cp ../ppl-pristine/{CREDITS,README} .
cbrulebois@Cygnus:/tmp/ppl-0.11.2$ gzip -9nf README CREDITS
cbrulebois@Cygnus:/tmp/ppl-0.11.2$ ls -l *gz
-rw-r--r-- 1 cbrulebois cbrulebois 6344 Feb  8 12:34 CREDITS.gz
-rw-r--r-- 1 cbrulebois cbrulebois 8745 Feb  8 12:34 README.gz

It looks to me like it shouldn't be hard to figure out what happens here
given the few tests I performed with the above command lines. On a few
iterations, reproducibility (with a given input command line) doesn't
seem to be an issue.

Mraw,
KiBi.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Wed, 08 Feb 2012 12:51:20 GMT) Full text and rfc822 format available.

Acknowledgement sent to Cyril Brulebois <kibi@debian.org>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Wed, 08 Feb 2012 12:51:27 GMT) Full text and rfc822 format available.

Message #60 received at 647522@bugs.debian.org (full text, mbox):

From: Cyril Brulebois <kibi@debian.org>
To: Zack Weinberg <zackw@panix.com>, 647522@bugs.debian.org
Cc: bug-gzip@gnu.org, Riku Voipio <riku.voipio@iki.fi>
Subject: Re: Bug#647522: non-deterministic compression results with gzip -n9
Date: Wed, 8 Feb 2012 13:15:00 +0100
[Message part 1 (text/plain, inline)]
Cyril Brulebois <kibi@debian.org> (08/02/2012):
> Playing on amd64:
> cbrulebois@Cygnus:/tmp/ppl-0.11.2$ cp ../ppl-pristine/{CREDITS,README} .
> cbrulebois@Cygnus:/tmp/ppl-0.11.2$ gzip -9nf CREDITS README
> cbrulebois@Cygnus:/tmp/ppl-0.11.2$ ls -l *gz
> -rw-r--r-- 1 cbrulebois cbrulebois 6343 Feb  8 12:34 CREDITS.gz
> -rw-r--r-- 1 cbrulebois cbrulebois 8745 Feb  8 12:34 README.gz
> cbrulebois@Cygnus:/tmp/ppl-0.11.2$ cp ../ppl-pristine/{CREDITS,README} .
> cbrulebois@Cygnus:/tmp/ppl-0.11.2$ gzip -9nf README CREDITS
> cbrulebois@Cygnus:/tmp/ppl-0.11.2$ ls -l *gz
> -rw-r--r-- 1 cbrulebois cbrulebois 6344 Feb  8 12:34 CREDITS.gz
> -rw-r--r-- 1 cbrulebois cbrulebois 8745 Feb  8 12:34 README.gz
> 
> It looks to me like it shouldn't be hard to figure out what happens here
> given the few tests I performed with the above command lines. On a few
> iterations, reproducibility (with a given input command line) doesn't
> seem to be an issue.

I think at least the attached patch won't hurt (when the DYN_ALLOC part
is fixed; and possibly turning that into a MEMSET-like macro).

And given dh_compress is passing files in an arbitrary order (it's using
“find” to detect files which needs to be compressed), I think we have
an explanation about the apparently-hard-to-reproduce issues.

Mraw,
KiBi.
[gzip-zeroify-buffers.diff (text/x-diff, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Wed, 08 Feb 2012 14:15:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Neil Williams <codehelp@debian.org>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Wed, 08 Feb 2012 14:15:03 GMT) Full text and rfc822 format available.

Message #65 received at 647522@bugs.debian.org (full text, mbox):

From: Neil Williams <codehelp@debian.org>
To: debian-devel@lists.debian.org
Cc: 647522@bugs.debian.org
Subject: Re: Please test gzip -9n - related to dpkg with multiarch support
Date: Wed, 8 Feb 2012 14:10:19 +0000
[Message part 1 (text/plain, inline)]
On Wed, 8 Feb 2012 14:14:22 +0100
Cyril Brulebois <kibi@debian.org> wrote:

> Neil Williams <codehelp@debian.org> (07/02/2012):
> > I'd like to ask for some help with a bug which is tripping up my tests
> > with the multiarch-aware dpkg from experimental - #647522 -
> > non-deterministic behaviour of gzip -9n.
> 
> For those not subscribed to that bug, how to reproduce[1] and possible
> fix[2] are available now. There might be other places where buffers are
> reused, I only spent a few minutes on this during my lunch break.
> 
>  1. http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=55;bug=647522
>  2. http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=60;bug=647522

Thanks for taking that one stage on, Cyril. I ran out of time to look
at this any further yesterday, I've only just got back to the bug and
noticed the hint about multiple files on the command line from Zack
Weinberg. It makes sense that with a single file on the command line,
the "aberrant" compressed file never appears when with more than one,
it can.

-- 


Neil Williams
=============
http://www.linux.codehelp.co.uk/

[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Wed, 08 Feb 2012 14:27:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Fabian Greffrath <fabian@greffrath.com>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Wed, 08 Feb 2012 14:27:04 GMT) Full text and rfc822 format available.

Message #70 received at 647522@bugs.debian.org (full text, mbox):

From: Fabian Greffrath <fabian@greffrath.com>
To: 647522@bugs.debian.org
Cc: Cyril Brulebois <kibi@debian.org>, Zack Weinberg <zackw@panix.com>, bug-gzip@gnu.org, Riku Voipio <riku.voipio@iki.fi>
Subject: Re: Bug#647522: non-deterministic compression results with gzip -n9
Date: Wed, 08 Feb 2012 15:22:50 +0100
> I think at least the attached patch won't hurt (when the DYN_ALLOC part
> is fixed; and possibly turning that into a MEMSET-like macro).

Just an idea, but couldn't ZEROIFY in the DYN_ALLOC part be defined as 
free() and subsequent calloc() of the arrays preserving their size?

 - Fabian





Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Wed, 08 Feb 2012 14:33:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Adam Borowski <kilobyte@angband.pl>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Wed, 08 Feb 2012 14:33:03 GMT) Full text and rfc822 format available.

Message #75 received at 647522@bugs.debian.org (full text, mbox):

From: Adam Borowski <kilobyte@angband.pl>
To: debian-devel@lists.debian.org, 647522@bugs.debian.org
Subject: Re: Please test gzip -9n - related to dpkg with multiarch support
Date: Wed, 8 Feb 2012 15:06:46 +0100
On Wed, Feb 08, 2012 at 02:14:22PM +0100, Cyril Brulebois wrote:
> For those not subscribed to that bug, how to reproduce[1] and possible
> fix[2] are available now. There might be other places where buffers are
> reused, I only spent a few minutes on this during my lunch break.
> 
>  2. http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=60;bug=647522

Even if you ensure a particular build behaves exactly the same on a given
architecture, you're merely introducing future problems.

gzip's output is likely to change:
* on a new version
* after a bugfix (including security ones)
* on a different architecture
* with different optimizations
* with a different implementation (like those parallel ones)
* possibly with a different moon phase

Especially the first is pretty guaranteed to bite: whenever the upstream
does a small improvement, binaries in the archive get invalidated until
rebuilt with the new gzip.

Breaking the ideas for diverting /bin/gzip by pigz is not nice, too.

-- 
// If you believe in so-called "intellectual property", please immediately
// cease using counterfeit alphabets.  Instead, contact the nearest temple
// of Amon, whose priests will provide you with scribal services for all
// your writing needs, for Reasonable and Non-Discriminatory prices.




Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Wed, 08 Feb 2012 17:09:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Riku Voipio <riku.voipio@iki.fi>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Wed, 08 Feb 2012 17:09:03 GMT) Full text and rfc822 format available.

Message #80 received at 647522@bugs.debian.org (full text, mbox):

From: Riku Voipio <riku.voipio@iki.fi>
To: Cyril Brulebois <kibi@debian.org>
Cc: Zack Weinberg <zackw@panix.com>, 647522@bugs.debian.org, bug-gzip@gnu.org, Riku Voipio <riku.voipio@iki.fi>
Subject: Re: Bug#647522: non-deterministic compression results with gzip -n9
Date: Wed, 8 Feb 2012 19:05:40 +0200
Thanks Cyril for tracking this down.

On Wed, Feb 08, 2012 at 01:15:00PM +0100, Cyril Brulebois wrote:
> I think at least the attached patch won't hurt (when the DYN_ALLOC part
> is fixed; and possibly turning that into a MEMSET-like macro).

> And given dh_compress is passing files in an arbitrary order (it's using
> “find” to detect files which needs to be compressed), I think we have
> an explanation about the apparently-hard-to-reproduce issues.

After some quick testing, ZEROIFY on th window is enough. However, clearing
all buffers is a good defensive strategy I think.

> Mraw,
> KiBi.

> diff --git a/gzip.c b/gzip.c
> index b867350..1153bde 100644
> --- a/gzip.c
> +++ b/gzip.c
> @@ -561,6 +561,19 @@ int main (int argc, char **argv)
>  	    SET_BINARY_MODE(fileno(stdout));
>  	}
>          while (optind < argc) {
> +
> +	    /* Make sure buffers are reset to 0 to ensure reproducibility when handling several files */
> +	    ZEROIFY(uch, inbuf,  INBUFSIZ +INBUF_EXTRA);
> +	    ZEROIFY(uch, outbuf, OUTBUFSIZ+OUTBUF_EXTRA);
> +	    ZEROIFY(ush, d_buf,  DIST_BUFSIZE);
> +	    ZEROIFY(uch, window, 2L*WSIZE);
> +#ifndef MAXSEG_64K
> +	    ZEROIFY(ush, tab_prefix, 1L<<BITS);
> +#else
> +	    ZEROIFY(ush, tab_prefix0, 1L<<(BITS-1));
> +	    ZEROIFY(ush, tab_prefix1, 1L<<(BITS-1));
> +#endif
> +
>  	    treat_file(argv[optind++]);
>  	}
>      } else {  /* Standard input */
> diff --git a/gzip.h b/gzip.h
> index 5270c56..7a1e84b 100644
> --- a/gzip.h
> +++ b/gzip.h
> @@ -119,11 +119,13 @@ extern int method;         /* compression method */
>        array = (type*)fcalloc((size_t)(((size)+1L)/2), 2*sizeof(type)); \
>        if (!array) xalloc_die (); \
>     }
> +#  error "ZEROIFY needs an implementation, KiBi is lazy"
>  #  define FREE(array) {if (array != NULL) fcfree(array), array=NULL;}
>  #else
>  #  define EXTERN(type, array)  extern type array[]
>  #  define DECLARE(type, array, size)  type array[size]
>  #  define ALLOC(type, array, size)
> +#  define ZEROIFY(type, array, size) { for (int i=0; i<size; i++) { array[i] = 0; } }
>  #  define FREE(array)
>  #endif
>  







Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Wed, 08 Feb 2012 19:33:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert@cs.ucla.edu>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Wed, 08 Feb 2012 19:33:04 GMT) Full text and rfc822 format available.

Message #85 received at 647522@bugs.debian.org (full text, mbox):

From: Paul Eggert <eggert@cs.ucla.edu>
To: Riku Voipio <riku.voipio@iki.fi>
Cc: Cyril Brulebois <kibi@debian.org>, 647522@bugs.debian.org, Zack Weinberg <zackw@panix.com>, bug-gzip@gnu.org
Subject: Re: Bug#647522: non-deterministic compression results with gzip -n9
Date: Wed, 08 Feb 2012 11:29:41 -0800
Thanks very much for the patch.  But can someone who's looked into it
please explain why 'window' needs to be zeroed out?  This will save
me time in reviewing the patch.  Thanks.




Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Thu, 09 Feb 2012 11:06:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Goswin von Brederlow <goswin-v-b@web.de>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Thu, 09 Feb 2012 11:06:17 GMT) Full text and rfc822 format available.

Message #90 received at 647522@bugs.debian.org (full text, mbox):

From: Goswin von Brederlow <goswin-v-b@web.de>
To: Adam Borowski <kilobyte@angband.pl>
Cc: debian-devel@lists.debian.org, 647522@bugs.debian.org
Subject: Re: Please test gzip -9n - related to dpkg with multiarch support
Date: Thu, 09 Feb 2012 12:02:21 +0100
Adam Borowski <kilobyte@angband.pl> writes:

> On Wed, Feb 08, 2012 at 02:14:22PM +0100, Cyril Brulebois wrote:
>> For those not subscribed to that bug, how to reproduce[1] and possible
>> fix[2] are available now. There might be other places where buffers are
>> reused, I only spent a few minutes on this during my lunch break.
>> 
>>  2. http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=60;bug=647522
>
> Even if you ensure a particular build behaves exactly the same on a given
> architecture, you're merely introducing future problems.
>
> gzip's output is likely to change:
> * on a new version

Yes, but not a big problem (other than a small race condition) since all
buildds should have the same version.

> * after a bugfix (including security ones)

Yes, but not a problem (other than a small race condition) since all
buildds should have the same version.

> * on a different architecture

No. I consider that a bug.

> * with different optimizations

Not a problem.

> * with a different implementation (like those parallel ones)

Not a problem (yet). We only have one gzip. pigz doesn't replace gzip.

> * possibly with a different moon phase

No. I consider that a bug.

> Especially the first is pretty guaranteed to bite: whenever the upstream
> does a small improvement, binaries in the archive get invalidated until
> rebuilt with the new gzip.

Not true. Packages only break if they are build with one gzip on one
arch and another on other archs. On gzip uploads there is a window where
archs will have different gzip versions so this is of some concern. But
not as bad as you make it look.

> Breaking the ideas for diverting /bin/gzip by pigz is not nice, too.

True. But why should gzip and pigz give different output? They should be
able to result in the same compressed output.

I think for pigz one problem is where to split the input. Making it
split at the same points as gzip --rsyncable does (and using that option
in gzip) could be a solution.

Or files in /usr/share/doc (where we have the collisions) could be
compressed with /usr/bin/gzip.gzip (assuming that would be the name of
the real binary providing the gzip alternative).

MfG
        Goswin




Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Thu, 09 Feb 2012 15:51:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Thu, 09 Feb 2012 15:51:05 GMT) Full text and rfc822 format available.

Message #95 received at 647522@bugs.debian.org (full text, mbox):

From: Joey Hess <joeyh@debian.org>
To: debian-devel@lists.debian.org, 647522@bugs.debian.org
Subject: Re: Please test gzip -9n - related to dpkg with multiarch support
Date: Thu, 9 Feb 2012 11:45:52 -0400
[Message part 1 (text/plain, inline)]
Goswin von Brederlow wrote:
> > * after a bugfix (including security ones)
> 
> Yes, but not a problem (other than a small race condition) since all
> buildds should have the same version.

And then if I have a multiarch system, and want to locally download the
source of some library, build it and install it, dpkg will complain if I
didn't use the same gzip that was used to build other arch versions I
have installed.

-- 
see shy jo
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Thu, 09 Feb 2012 16:24:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bastian Blank <waldi@debian.org>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Thu, 09 Feb 2012 16:24:03 GMT) Full text and rfc822 format available.

Message #100 received at 647522@bugs.debian.org (full text, mbox):

From: Bastian Blank <waldi@debian.org>
To: debian-devel@lists.debian.org
Cc: 647522@bugs.debian.org
Subject: Re: Please test gzip -9n - related to dpkg with multiarch support
Date: Thu, 9 Feb 2012 17:21:32 +0100
On Thu, Feb 09, 2012 at 11:45:52AM -0400, Joey Hess wrote:
> And then if I have a multiarch system, and want to locally download the
> source of some library, build it and install it, dpkg will complain if I
> didn't use the same gzip that was used to build other arch versions I
> have installed.

dpkg would complain anyway, because the versions are different.

Bastian

-- 
The sight of death frightens them [Earthers].
		-- Kras the Klingon, "Friday's Child", stardate 3497.2




Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Fri, 10 Feb 2012 17:12:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ian Jackson <ijackson@chiark.greenend.org.uk>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Fri, 10 Feb 2012 17:12:03 GMT) Full text and rfc822 format available.

Message #105 received at 647522@bugs.debian.org (full text, mbox):

From: Ian Jackson <ijackson@chiark.greenend.org.uk>
To: Bastian Blank <waldi@debian.org>
Cc: debian-devel@lists.debian.org, 647522@bugs.debian.org
Subject: Re: Please test gzip -9n - related to dpkg with multiarch support
Date: Fri, 10 Feb 2012 17:09:50 +0000
Bastian Blank writes ("Re: Please test gzip -9n - related to dpkg with multiarch support"):
>On Thu, Feb 09, 2012 at 11:45:52AM -0400, Joey Hess wrote:
>>And then if I have a multiarch system, and want to locally download the
>>source of some library, build it and install it, dpkg will complain if I
>>didn't use the same gzip that was used to build other arch versions I
>>have installed.
>
>dpkg would complain anyway, because the versions are different.

(a) You could make them not be.  Eg just download the existing source
    code and tweak it and rebuild and install.

(b) There will surely be some --force- option you can use to override
    this and it should not be necessary to also override problems
    involving different versions of the same files.

Ian.




Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Mon, 13 Feb 2012 00:51:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Cyril Brulebois <kibi@debian.org>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Mon, 13 Feb 2012 00:51:03 GMT) Full text and rfc822 format available.

Message #110 received at 647522@bugs.debian.org (full text, mbox):

From: Cyril Brulebois <kibi@debian.org>
To: Paul Eggert <eggert@cs.ucla.edu>
Cc: Riku Voipio <riku.voipio@iki.fi>, 647522@bugs.debian.org, Zack Weinberg <zackw@panix.com>, bug-gzip@gnu.org
Subject: Re: Bug#647522: non-deterministic compression results with gzip -n9
Date: Mon, 13 Feb 2012 01:49:56 +0100
[Message part 1 (text/plain, inline)]
Paul Eggert <eggert@cs.ucla.edu> (08/02/2012):
> Thanks very much for the patch.  But can someone who's looked into it
> please explain why 'window' needs to be zeroed out?  This will save me
> time in reviewing the patch.  Thanks.

Welcome. Here's a slightly more detailed analysis. You can find attached
the sample files I used (still from the ppl source package). I managed
to track it down to deflate() (what a surprise!), so I decided to check
the matches. Printing match_length only led me to a single difference,
while printing both match_length and match_start (which according to a
comment is set by longest_match()), I got two differences.

To produce the below-quoted diff, using the attached patch, I did:
| tar xfz ppl.tar.gz
| gzip -9f README CREDITS > match-2
| tar xfz ppl.tar.gz
| gzip -9f CREDITS > match-1
| diff -u match-2 match-1

Of course there's an extra file showing up in the diff, but skipping it,
here's the diff:
|  match: 46 @ 19459
|  match: 8 @ 15659
|  match: 8 @ 15659
| -match: 7 @ 17193
| -match: 6 @ 17193
| +match: 6 @ 19510
| +match: 6 @ 19510
| [ no more matches ]

Interestingly, the size of the CREDITS file is 19579.

Mraw,
KiBi.
[0001-Pin-point-dirty-window-bug.patch (text/x-diff, attachment)]
[ppl.tar.gz (application/octet-stream, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Bdale Garbee <bdale@gag.com>:
Bug#647522; Package gzip. (Sun, 18 Mar 2012 18:33:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert@cs.ucla.edu>:
Extra info received and forwarded to list. Copy sent to Bdale Garbee <bdale@gag.com>. (Sun, 18 Mar 2012 18:33:03 GMT) Full text and rfc822 format available.

Message #115 received at 647522@bugs.debian.org (full text, mbox):

From: Paul Eggert <eggert@cs.ucla.edu>
To: Cyril Brulebois <kibi@debian.org>
Cc: Riku Voipio <riku.voipio@iki.fi>, 647522@bugs.debian.org, Zack Weinberg <zackw@panix.com>, bug-gzip@gnu.org, Bdale Garbee <bdale@gag.com>
Subject: Re: Bug#647522: non-deterministic compression results with gzip -n9
Date: Sun, 18 Mar 2012 11:19:51 -0700
Cyril, thanks for the test case.  When I used 'valgrind' on it
I found where gzip is accessing uninitialized data.  I pushed
into gzip master the patch at the end of this message; it fixed
things for me.

The Debian patch, which zeros out a lot of buffers, should
work if gzip is compressing regular files, but may have
problems in unusual cases if gzip compresses data from
pipes, devices, or other non-regular files, because in that
case short reads may later cause garbage to be put into the
dictionary.  So I suggest using the following patch instead.

http://git.savannah.gnu.org/cgit/gzip.git/commit/?id=0a284baeaedca68017f46d2646e4c921aa98a90d

From b9de47462b1b487cf4024b4c157ee5ac6c5849c3 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert@cs.ucla.edu>
Date: Sun, 18 Mar 2012 11:07:02 -0700
Subject: [PATCH] gzip: fix nondeterministic compression results

Reported by Jakub Wilk in <http://bugs.debian.org/647522>.
* deflate.c (fill_window): Don't let garbage pollute the dictionary.
---
 deflate.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/deflate.c b/deflate.c
index 6c19552..5405f10 100644
--- a/deflate.c
+++ b/deflate.c
@@ -571,6 +571,8 @@ local void fill_window()
         n = read_buf((char*)window+strstart+lookahead, more);
         if (n == 0 || n == (unsigned)EOF) {
             eofile = 1;
+            /* Don't let garbage pollute the dictionary.  */
+            memzero (window + strstart + lookahead, MIN_MATCH - 1);
         } else {
             lookahead += n;
         }
-- 
1.7.6.5






Reply sent to Bdale Garbee <bdale@gag.com>:
You have taken responsibility. (Mon, 19 Mar 2012 10:36:16 GMT) Full text and rfc822 format available.

Notification sent to Jakub Wilk <jwilk@debian.org>:
Bug acknowledged by developer. (Mon, 19 Mar 2012 10:36:17 GMT) Full text and rfc822 format available.

Message #120 received at 647522-close@bugs.debian.org (full text, mbox):

From: Bdale Garbee <bdale@gag.com>
To: 647522-close@bugs.debian.org
Subject: Bug#647522: fixed in gzip 1.4-5
Date: Mon, 19 Mar 2012 10:32:29 +0000
Source: gzip
Source-Version: 1.4-5

We believe that the bug you reported is fixed in the latest version of
gzip, which is due to be installed in the Debian FTP archive:

gzip-win32_1.4-5_all.deb
  to main/g/gzip/gzip-win32_1.4-5_all.deb
gzip_1.4-5.debian.tar.gz
  to main/g/gzip/gzip_1.4-5.debian.tar.gz
gzip_1.4-5.dsc
  to main/g/gzip/gzip_1.4-5.dsc
gzip_1.4-5_i386.deb
  to main/g/gzip/gzip_1.4-5_i386.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 647522@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Bdale Garbee <bdale@gag.com> (supplier of updated gzip package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Format: 1.8
Date: Mon, 19 Mar 2012 11:07:22 +0100
Source: gzip
Binary: gzip gzip-win32
Architecture: source all i386
Version: 1.4-5
Distribution: unstable
Urgency: low
Maintainer: Bdale Garbee <bdale@gag.com>
Changed-By: Bdale Garbee <bdale@gag.com>
Description: 
 gzip       - GNU compression utilities
 gzip-win32 - GNU compression utility (win32 build)
Closes: 647522
Changes: 
 gzip (1.4-5) unstable; urgency=low
 .
   * patch from upstream to address determinism issue, closes: #647522
Checksums-Sha1: 
 c5ea590ae2145135119cef64926b0cd41679065f 1847 gzip_1.4-5.dsc
 9c96de587730194563f80c7139d85bc5368478b3 15119 gzip_1.4-5.debian.tar.gz
 3bc5398532c7af2b7df61da3c7ccd9b6664824ba 81332 gzip-win32_1.4-5_all.deb
 46600ffe63b48be2c43f7631c00a7f053f5697cf 99318 gzip_1.4-5_i386.deb
Checksums-Sha256: 
 33ac981c6bd65ede1c44c364682b758b93bcdaf2439665edfa9679b9d5d916e7 1847 gzip_1.4-5.dsc
 390c7ff194ce861a51d7c4457251712bd5981544e2388960f25602fdb13f3c6b 15119 gzip_1.4-5.debian.tar.gz
 ea48141d48e0308db105dbf753041367a931bbf8a779e9c8daac68be20e19e2a 81332 gzip-win32_1.4-5_all.deb
 8bd6b2cfb02a0f7f3c2971619fbfca4fd5063a30e86cd2e030ed4da2b52ee3f8 99318 gzip_1.4-5_i386.deb
Files: 
 7f294d044cb837e34b706463e646f913 1847 utils required gzip_1.4-5.dsc
 df8c7bdaca1385ddd8bcf986c162ab71 15119 utils required gzip_1.4-5.debian.tar.gz
 2605819ddfee0e92bedae5b0587c04cc 81332 utils extra gzip-win32_1.4-5_all.deb
 4c118887568572430b52774dab4dba0a 99318 utils required gzip_1.4-5_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIVAwUBT2cGiDqTYZbAldlBAQpHAg/9HB1MpDhEtMFKgWwsZhyGWD17/p1cnBmu
CDe1MOl+unredKYe/2Ib7TSxMOi9Ly6JHnMKzWhCBRhuM0G7ufsYy8zwN3IFndp3
yqbDpTyEqCyw9Ck5Qs44AQ9kLhZCc+TqHTvkkZRk2BTg0/Y2IoiFG3VLELkgRXhH
+xKgehhY10421rRZB5r8jNYMfr3NzkvvyY92a4tmRewXXfI+IfiAX9oK15NRd87X
mBMrFM3PBDQz5LyrNvVZIpbO0+HyHeybsEKsNQT2jUwFbKbWRT7F47JGocaGtc4j
GnjgIm24rPlk4BqfHwUk+w8Ipdi/jLgOA3ghqipRt7rYBOQEJb4VvoOleNh+Pg1U
gjzTptcIFC10d0N/R4Yy6pfYvqBaAuOKCVHsKs0mErfKTV1dDp4vS3WW4auQ5Eyr
Kk9Z4WzoE1KQTXAaMZQgSyIy9Py46nsFUMs9I5z0AGw9TTowiqwxxcyltdwDf0hv
09jfNhZQem57a7r92ySeNKTdk4ykX0PPzk0XzxN8RuCnNZAS81UL8A+Z6mDHzC31
dMNLnvQC+edrAfkpsBrlx7b9hCENV6clfFtzqroU96jYQmmbe7Zm2gMh3J9tjA7m
FqrEfs2Akuz8BSKgY5WNNNgl1hCBpYY2ggRfHSig06UDeS5jFrAfvBTAicH7bAz+
5uNKlz8zaFY=
=x9do
-----END PGP SIGNATURE-----





Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Fri, 27 Apr 2012 07:36:02 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Apr 25 07:23:09 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.