Debian Bug report logs - #641019
pristine-tar does not work with tar files made by openSUSE

version graph

Package: pristine-tar; Maintainer for pristine-tar is Tomasz Buchert <tomasz@debian.org>; Source for pristine-tar is src:pristine-tar (PTS, buildd, popcon).

Reported by: Jonathan Riddell <jriddell@ubuntu.com>

Date: Fri, 9 Sep 2011 13:45:01 UTC

Severity: normal

Found in version pristine-tar/1.14

Fixed in version pristine-tar/1.16

Done: Joey Hess <joeyh@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Joey Hess <joeyh@debian.org>:
Bug#641019; Package pristine-tar. (Fri, 09 Sep 2011 13:45:04 GMT) (full text, mbox, link).


Acknowledgement sent to Jonathan Riddell <jriddell@ubuntu.com>:
New Bug report received and forwarded. Copy sent to Joey Hess <joeyh@debian.org>. (Fri, 09 Sep 2011 13:45:05 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Jonathan Riddell <jriddell@ubuntu.com>
To: submit@bugs.debian.org
Subject: pristine-tar does not work with tar files made by openSUSE
Date: Fri, 9 Sep 2011 14:16:25 +0100
Package: pristine-tar
Version: 1.14

openSUSE includes a patch to bzip2 to change a maxlength value back to
the pre bzip2 1.0.3 setting.  This means upstream tars such as those
made for KDE can not be used with pristine-tar.

Offending patch
https://build.opensuse.org/package/view_file?file=bzip2-maxlen20.patch&package=bzip2&project=openSUSE%3AFactory&srcmd5=3ee4cf959e98e3ca50a881d1cdc13570

Example offending tar file:
https://launchpad.net/ubuntu/+archive/primary/+files/kde-l10n-is_4.7.1.orig.tar.bz2

Good tar file to compare to:
https://launchpad.net/ubuntu/oneiric/+source/kde-l10n-is/4:4.7.0-0ubuntu1/+files/kde-l10n-is_4.7.0.orig.tar.bz2

failing command:
pristine-bz2 --no-verbose --no-debug --no-keep gendelta kde-l10n-is_4.7.1.orig.tar.bz2 kde-l10n-is_4.7.0.orig.tar.bz2

Jonathan




Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#641019; Package pristine-tar. (Mon, 12 Sep 2011 20:57:08 GMT) (full text, mbox, link).


Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. (Mon, 12 Sep 2011 20:57:08 GMT) (full text, mbox, link).


Message #10 received at 641019@bugs.debian.org (full text, mbox, reply):

From: Joey Hess <joeyh@debian.org>
To: Jonathan Riddell <jriddell@ubuntu.com>, 641019@bugs.debian.org
Subject: Re: Bug#641019: pristine-tar does not work with tar files made by openSUSE
Date: Mon, 12 Sep 2011 16:54:03 -0400
[Message part 1 (text/plain, inline)]
Jonathan Riddell wrote:
> openSUSE includes a patch to bzip2 to change a maxlength value back to
> the pre bzip2 1.0.3 setting.  This means upstream tars such as those
> made for KDE can not be used with pristine-tar.
> 
> Offending patch
> https://build.opensuse.org/package/view_file?file=bzip2-maxlen20.patch&package=bzip2&project=openSUSE%3AFactory&srcmd5=3ee4cf959e98e3ca50a881d1cdc13570

What a PITA. Why did they do this?

I've done some preliminary work toward supporting this using the
existing old-bzip2 code in zgz, plus the block sort code from the
current bzip2. Now zgz --old-bzip2 --quirk suse comes fairly close to
reproducing the file, with a binary delta of just a few KB. The
remaining difference, which I have not managed to eliminate, is in the
header. Looks like the CRC and maybe some further stuff differs.

I may just use xdelta to handle this case. Have not written the code for
that, which would be a little tricky since pristine-bz2 doesn't use
xdelta so far.

Of course I'd rather find a way to reproduce the bzip2 output directly,
but want to avoid putting all of bzip2 1.0.5 into zgz, which would be
especially complicated since it already had old-bzip2 in it. I have 
a cut-down version of bzip2 1.0.5 in the separate-suse git branch that
could be used toward this end, maybe by being built into a helper
program for zgz.

Either way adds thousands of lines of code here, which is, as mentioned,
a PITA. It would be much better if upstream bzip2 could just get a
switch to enable the 20 byte huffman table.

-- 
see shy jo
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Joey Hess <joeyh@debian.org>:
Bug#641019; Package pristine-tar. (Fri, 02 Dec 2011 21:45:03 GMT) (full text, mbox, link).


Acknowledgement sent to Mikołaj Izdebski <zurgunt@gmail.com>:
Extra info received and forwarded to list. Copy sent to Joey Hess <joeyh@debian.org>. (Fri, 02 Dec 2011 21:45:04 GMT) (full text, mbox, link).


Message #15 received at 641019@bugs.debian.org (full text, mbox, reply):

From: Mikołaj Izdebski <zurgunt@gmail.com>
To: 641019@bugs.debian.org
Subject: pristine-tar does not work with tar files made by openSUSE
Date: Fri, 2 Dec 2011 22:41:49 +0100
Hello,

This bug is much more generic.

From theoretical point of view, for every plain (uncompressed) file
there exist *infinite* number of bz2 compressed files that correctly
decompress to the plain file.

In practice there exists number of different compressors that can
create different compressed files. Those include lbzip2 and pbzip2,
which may become even more popular as number of CPU cores increases
rapidly.

Even the newest version of unmodified upstream (or Debian) bzip2 can
produce different compressed files with the same block size. Basically
it's because bzip2 internally uses shellsort and quicksort, which
aren't stable sorting algorithms. Block-sorting can therefore produce
different results under different circumstances. If anyone cares I can
provide a proof-of-concept and/or explain why that happens.

The same thinking applies to gzip-compressed files.

IMO this bug should be merged with #563651, renamed to something like
"does not support tarballs compressed with alternative compressors"
and tagged wontfix (unless there is a sane solution, which I can't
think about now).

Mikołaj

PS. I know the internals of bzip2 *really* well. I am open for
discussion about any possible solutions.




Information forwarded to debian-bugs-dist@lists.debian.org, Joey Hess <joeyh@debian.org>:
Bug#641019; Package pristine-tar. (Wed, 04 Jan 2012 01:15:07 GMT) (full text, mbox, link).


Acknowledgement sent to Nicolás Alvarez <nicolas.alvarez@gmail.com>:
Extra info received and forwarded to list. Copy sent to Joey Hess <joeyh@debian.org>. (Wed, 04 Jan 2012 01:15:07 GMT) (full text, mbox, link).


Message #20 received at 641019@bugs.debian.org (full text, mbox, reply):

From: Nicolás Alvarez <nicolas.alvarez@gmail.com>
To: 641019@bugs.debian.org
Subject: pristine-tar does not work with tar files made by openSUSE
Date: Tue, 3 Jan 2012 22:10:47 -0300
I applied the maxlen s/17/20/ patch to Debian's bzip2 1.0.6-1 package,
installed the patched libbz2, and pristine-bz2 was now able to
recreate KDE's tarballs using pbzip2 (which dynamically links to
libbz2).

I didn't have any problems with headers or anything else as mentioned
in message 10. Just that one change made everything work.

-- 
Nicolas




Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#641019; Package pristine-tar. (Wed, 04 Jan 2012 20:57:07 GMT) (full text, mbox, link).


Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. (Wed, 04 Jan 2012 20:57:07 GMT) (full text, mbox, link).


Message #25 received at 641019@bugs.debian.org (full text, mbox, reply):

From: Joey Hess <joeyh@debian.org>
To: 641019@bugs.debian.org
Cc: Mikołaj Izdebski <zurgunt@gmail.com>
Subject: Re: pristine-tar does not work with tar files made by openSUSE
Date: Wed, 4 Jan 2012 16:56:12 -0400
[Message part 1 (text/plain, inline)]
Mikołaj Izdebski:
> From theoretical point of view, for every plain (uncompressed) file
> there exist *infinite* number of bz2 compressed files that correctly
> decompress to the plain file.

pristine-tar consists of a bet that, while this is certianly the
theoretical case, the number of actual implementations of a compressor
for a given file format will be manageable, and that moreover
implementations will deterministically produce the same result for a
given set of inputs.

There are two reasons to think this is the case. First, the 80/20 rule
applies; most people who want to compress a file with bzip2 are going
to do it using one of a few commonly available implementations, using
more or less the default parameters.

Secondly, pristine-gz is known to reproduce nearly every gzip file used in a
source package in Debian, which were created across a wide span of time,
on a diverse set of operating systems.

> Even the newest version of unmodified upstream (or Debian) bzip2 can
> produce different compressed files with the same block size. Basically
> it's because bzip2 internally uses shellsort and quicksort, which
> aren't stable sorting algorithms. Block-sorting can therefore produce
> different results under different circumstances. If anyone cares I can
> provide a proof-of-concept and/or explain why that happens.
> 
> The same thinking applies to gzip-compressed files.

It shouldn't matter if it's unstable as long as repeatedly running the
same implementation of the sort with the same inputs produces the same
result.

Of course an implementation of an unstable sorting algorithm could use
some value that varies between runs (ie, something based on the current
time or memory layout) to break ties in its comparison function, but at
least for gzip (and compress) implementations, that does not seem to
have ever been the case.

-- 
see shy jo
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Joey Hess <joeyh@debian.org>:
Bug#641019; Package pristine-tar. (Wed, 04 Jan 2012 22:30:03 GMT) (full text, mbox, link).


Acknowledgement sent to Mikołaj Izdebski <zurgunt@gmail.com>:
Extra info received and forwarded to list. Copy sent to Joey Hess <joeyh@debian.org>. (Wed, 04 Jan 2012 22:30:03 GMT) (full text, mbox, link).


Message #30 received at 641019@bugs.debian.org (full text, mbox, reply):

From: Mikołaj Izdebski <zurgunt@gmail.com>
To: 641019@bugs.debian.org
Subject: Re: pristine-tar does not work with tar files made by openSUSE
Date: Wed, 4 Jan 2012 23:26:49 +0100
>> From theoretical point of view, for every plain (uncompressed) file
>> there exist *infinite* number of bz2 compressed files that correctly
>> decompress to the plain file.
>
> pristine-tar consists of a bet that, while this is certianly the
> theoretical case, the number of actual implementations of a compressor
> for a given file format will be manageable, and that moreover
> implementations will deterministically produce the same result for a
> given set of inputs.
>
> There are two reasons to think this is the case. First, the 80/20 rule
> applies; most people who want to compress a file with bzip2 are going
> to do it using one of a few commonly available implementations, using
> more or less the default parameters.

Do you consider alternative bzip2 implementations available in Debian
(lbzip2, pbzip2, p7zip-full, libcommons-compress-java) as "commonly
available implementations"? They all produce different compressed
files for the same input file. Moreover, lbzip2-0.23 from stable
produces different files than lbzip2-2.1 from unstable.

Should any incompatibility with all those compressors be reported as
separate, independent pristine-tar bugs? If yes, I'd be happy to do
so.

> Secondly, pristine-gz is known to reproduce nearly every gzip file used in a
> source package in Debian, which were created across a wide span of time,
> on a diverse set of operating systems.

I believe that pristine-tar generates "binary diffs" for gzip files it
fails to reproduce, but doesn't do the same for bzip2 files. Maybe
implementing such feature for bzip2 files is the solution?

> Of course an implementation of an unstable sorting algorithm could use
> some value that varies between runs (ie, something based on the current
> time or memory layout) to break ties in its comparison function, but at
> least for gzip (and compress) implementations, that does not seem to
> have ever been the case.

My point was that block size isn't the only factor the resulting file
depends on. There is also a "work factor", as described in bzip2
documentation. Even the same version of bzip2, with the same block
size given, for the same input can produce different outputs, given
that work factors are different. A proof of concept is available in
lbzip2 git repo:

   https://raw.github.com/kjn/lbzip2/master/tests/incomp

Mikołaj




Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#641019; Package pristine-tar. (Thu, 05 Jan 2012 00:21:03 GMT) (full text, mbox, link).


Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. (Thu, 05 Jan 2012 00:21:03 GMT) (full text, mbox, link).


Message #35 received at 641019@bugs.debian.org (full text, mbox, reply):

From: Joey Hess <joeyh@debian.org>
To: Mikołaj Izdebski <zurgunt@gmail.com>, 641019@bugs.debian.org
Subject: Re: Bug#641019: pristine-tar does not work with tar files made by openSUSE
Date: Wed, 4 Jan 2012 20:16:58 -0400
[Message part 1 (text/plain, inline)]
Mikołaj Izdebski wrote:
> Do you consider alternative bzip2 implementations available in Debian
> (lbzip2, pbzip2, p7zip-full, libcommons-compress-java) as "commonly
> available implementations"? They all produce different compressed
> files for the same input file. Moreover, lbzip2-0.23 from stable
> produces different files than lbzip2-2.1 from unstable.

I have not seen files produced by these yet afaik, but it's sure nice to
have a list to try when someone comes with a weird file; I did not know
about some of those!

Even bzip2 changed its output after 0.9.5d -- I have a program that uses
the compressor from the old version since some files needed it.

> I believe that pristine-tar generates "binary diffs" for gzip files it
> fails to reproduce, but doesn't do the same for bzip2 files. Maybe
> implementing such feature for bzip2 files is the solution?

I'll add it if I see a bz2 file that can nearly exactly be reproduced
and only needs the delta to get the rest of the way. Haven't yet.

> My point was that block size isn't the only factor the resulting file
> depends on. There is also a "work factor", as described in bzip2
> documentation. Even the same version of bzip2, with the same block
> size given, for the same input can produce different outputs, given
> that work factors are different. A proof of concept is available in
> lbzip2 git repo:

I have yet to see a bz2 file in the wild that uses a nonstandard
block size, so pristine-bz2 doesn't bother to try nonstandard block
sizes by default yet. Since bzip2 --exponential is not documented, I will
worry about it when I find a file using it in the wild.

-- 
see shy jo
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Joey Hess <joeyh@debian.org>:
Bug#641019; Package pristine-tar. (Thu, 05 Jan 2012 01:12:07 GMT) (full text, mbox, link).


Acknowledgement sent to Mikołaj Izdebski <zurgunt@gmail.com>:
Extra info received and forwarded to list. Copy sent to Joey Hess <joeyh@debian.org>. (Thu, 05 Jan 2012 01:12:07 GMT) (full text, mbox, link).


Message #40 received at 641019@bugs.debian.org (full text, mbox, reply):

From: Mikołaj Izdebski <zurgunt@gmail.com>
To: Joey Hess <joeyh@debian.org>
Cc: 641019@bugs.debian.org
Subject: Re: Bug#641019: pristine-tar does not work with tar files made by openSUSE
Date: Thu, 5 Jan 2012 02:09:16 +0100
>> Do you consider alternative bzip2 implementations available in Debian
>> (lbzip2, pbzip2, p7zip-full, libcommons-compress-java) as "commonly
>> available implementations"? They all produce different compressed
>> files for the same input file. Moreover, lbzip2-0.23 from stable
>> produces different files than lbzip2-2.1 from unstable.
>
> I have not seen files produced by these yet afaik, but it's sure nice to
> have a list to try when someone comes with a weird file; I did not know
> about some of those!

It's a vicious circle. People refrain from using alternative bzip2
implementations partly because pristine-tar doesn't support them. And
pristine-tar doesn't support them because they're not used widely
enough. For example I didn't compress my Debian lbzip2 package with
lbzip2 itself only because of problems with pristine-tar!

> Even bzip2 changed its output after 0.9.5d -- I have a program that uses
> the compressor from the old version since some files needed it.

Not the first and not the last time. The last change I'm aware of was
in Oct 2004 (version 1.0.3). This means that with default parameters
bzip2 1.0.2 and 1.0.3 can produce different files.

>> I believe that pristine-tar generates "binary diffs" for gzip files it
>> fails to reproduce, but doesn't do the same for bzip2 files. Maybe
>> implementing such feature for bzip2 files is the solution?
>
> I'll add it if I see a bz2 file that can nearly exactly be reproduced
> and only needs the delta to get the rest of the way. Haven't yet.

Does an artificially crafted bz2 file count? If so, I can easily create one.




Reply sent to Joey Hess <joeyh@debian.org>:
You have taken responsibility. (Thu, 05 Jan 2012 02:51:08 GMT) (full text, mbox, link).


Notification sent to Jonathan Riddell <jriddell@ubuntu.com>:
Bug acknowledged by developer. (Thu, 05 Jan 2012 02:51:08 GMT) (full text, mbox, link).


Message #45 received at 641019-close@bugs.debian.org (full text, mbox, reply):

From: Joey Hess <joeyh@debian.org>
To: 641019-close@bugs.debian.org
Subject: Bug#641019: fixed in pristine-tar 1.16
Date: Thu, 05 Jan 2012 02:48:50 +0000
Source: pristine-tar
Source-Version: 1.16

We believe that the bug you reported is fixed in the latest version of
pristine-tar, which is due to be installed in the Debian FTP archive:

pristine-tar_1.16.dsc
  to main/p/pristine-tar/pristine-tar_1.16.dsc
pristine-tar_1.16.tar.gz
  to main/p/pristine-tar/pristine-tar_1.16.tar.gz
pristine-tar_1.16_i386.deb
  to main/p/pristine-tar/pristine-tar_1.16_i386.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 641019@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Joey Hess <joeyh@debian.org> (supplier of updated pristine-tar package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Wed, 04 Jan 2012 21:44:23 -0400
Source: pristine-tar
Binary: pristine-tar
Architecture: source i386
Version: 1.16
Distribution: unstable
Urgency: low
Maintainer: Joey Hess <joeyh@debian.org>
Changed-By: Joey Hess <joeyh@debian.org>
Description: 
 pristine-tar - regenerate pristine tarballs
Closes: 641019
Changes: 
 pristine-tar (1.16) unstable; urgency=low
 .
   * pristine-bz2: Can recreate bz2 files greated by Suse's
     patched bzip2. Closes: #641019
Checksums-Sha1: 
 d71192159863e1e0d3ec4f43e05a9acb9c8d22bf 1591 pristine-tar_1.16.dsc
 991fa549d31c4bfa85b4ff9b647119bc9e42edcc 138209 pristine-tar_1.16.tar.gz
 59cca57f6809dc7764c3209e80e048eb2443b442 192642 pristine-tar_1.16_i386.deb
Checksums-Sha256: 
 b5dbbcdd31b0ee7d651c6ebae7079338ecbbb9964d7e34ec662a59d6829a38d4 1591 pristine-tar_1.16.dsc
 55a580ee206d931074360e9ec2cdb46a1e382baea7c179906d211074e2084e9a 138209 pristine-tar_1.16.tar.gz
 d29cf53e571b67530676b5cd643c881f0311a89695d0c69e15f639ecbf5df1fb 192642 pristine-tar_1.16_i386.deb
Files: 
 0b07eb5bb540d461599cff84bf2d2c5b 1591 utils optional pristine-tar_1.16.dsc
 a7e9ff93b9f79b7d2da777dfd869820e 138209 utils optional pristine-tar_1.16.tar.gz
 c5ce75ba2030d0faef72da504c40eeb3 192642 utils optional pristine-tar_1.16_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIVAwUBTwUBE8kQ2SIlEuPHAQh02Q//eEefhdk77L4XooFSSAVU94lygoW0qDzp
dj3P0Yh3a0jiHX6Aa6DQnU3M9f0RkbfbSpCeYozG8Xc3oOreoh81dB1JQ9Jblc2h
hzUnXzLR1Z3c61eW1D1cb4a6O2Isj+A/wzOhlfU3rMLAeyLui+iKDp0zYRT6tmvA
w1w+WMCHyPoJvmtEjyOX7gGp+kxPd5mq5B4N56b/r4Bg9kU3dcMWLvKbAxpnKDCz
5xDMTzQe/HX3vIITSptLFNlX1gOChl9sOmf+x3BrS2I9ng0tfhed7v/8nFLU+Z3c
2qGtUY8VY1SxTTWBrLSsNbFT6NGineIe92zqSN6laUe0SFp5/cyw0RuDlxkFB2o8
SvgNMSXdNIfEFIiC6GmXzR0J18fz25p4K7uCSRxun+F9T4pBIzyHrRBhm3S2Ddi+
35zfVXhGAJaab6qDi+aKWEg3wV9ftngK9bJRv1SLWCfKwiLiTErzr91hvftB0Lc5
eYlbA4IP1n0UX1C5BTXbhOLAp5WNHCYFCKIVzzr0rbS3StemXYywOVFipHfvrnKJ
fPvGGzgbOCha7lzoeXZ6cjyBC2wqSG/+qT7AQ5IxYGInOYII9YMeR+AB1ooCtThj
20WMqcS4ckGD7G47BcLSsvpbhAxGjGfBe803vQumnW674vQTr5vMr2eJXH2w4CmS
J3MM34nMsD0=
=xK0A
-----END PGP SIGNATURE-----





Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Sun, 05 Feb 2012 07:34:56 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Jan 12 19:25:03 2018; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.