Debian Bug report logs - #719844
dpkg-deb: Make compression of {data,control}.tar.gz deterministic

version graph

Package: dpkg; Maintainer for dpkg is Dpkg Developers <debian-dpkg@lists.debian.org>; Source for dpkg is src:dpkg.

Reported by: Asheesh Laroia <asheesh@asheesh.org>

Date: Thu, 15 Aug 2013 23:54:02 UTC

Severity: normal

Found in version dpkg/1.16.10

Fixed in version dpkg/1.17.2

Done: Guillem Jover <guillem@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Thu, 15 Aug 2013 23:54:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Asheesh Laroia <asheesh@asheesh.org>:
New Bug report received and forwarded. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Thu, 15 Aug 2013 23:54:06 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Asheesh Laroia <asheesh@asheesh.org>
To: submit@bugs.debian.org
Subject: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Thu, 15 Aug 2013 19:42:17 -0400 (EDT)
Package: dpkg
Version: 1.16.10
Severity: normal

*** Please type your report below this line ***

Summary: In lib/dpkg/compress.c, I would like it if dpkg did not store 
timestamps in the gzip files. That way, the creation of the data.tar.gz 
would be deterministic.

In particular, when I build a binary package with the very same contents 
twice, I see that data.tar.xz and control.tar.gz are both different 
although they have the same contents (even timestamps).

Binary files one/control.tar.gz and two/control.tar.gz differ
Binary files one/data.tar.xz and two/data.tar.xz differ

In 1.16.10 I would add '-n' to the call to 'gzip' in a pipe. In curent 
dpkg git, I believe this may be fixed, but I am not sure. I am filing this 
bug so there is a clear statement of the general problem in the BTS, at 
least!



-- System Information:
Debian Release: squeeze/sid
  APT prefers oldstable
  APT policy: (500, 'oldstable'), (500, 'testing'), (500, 'stable'), (1, 
'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages dpkg depends on:
ii  libbz2-1.0         1.0.5-4               high-quality block-sorting 
file co
ii  libc6              2.17-3                Embedded GNU C Library: 
Shared lib
ii  liblzma5           5.1.1alpha+20120614-2 XZ-format compression library
ii  libselinux1        2.0.96-1              SELinux runtime shared 
libraries
ii  tar                1.23-2.1              GNU version of the tar 
archiving u
ii  zlib1g             1:1.2.7.dfsg-13       compression library - runtime

dpkg recommends no packages.

Versions of packages dpkg suggests:
ii  apt                           0.9.7.8    commandline package manager

-- no debconf information



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Fri, 16 Aug 2013 07:03:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Guillem Jover <guillem@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Fri, 16 Aug 2013 07:03:04 GMT) Full text and rfc822 format available.

Message #10 received at 719844@bugs.debian.org (full text, mbox):

From: Guillem Jover <guillem@debian.org>
To: Asheesh Laroia <asheesh@asheesh.org>, 719844@bugs.debian.org
Subject: Re: Bug#719844: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Fri, 16 Aug 2013 08:58:02 +0200
Hi!

On Thu, 2013-08-15 at 19:42:17 -0400, Asheesh Laroia wrote:
> Package: dpkg
> Version: 1.16.10
> Severity: normal

> Summary: In lib/dpkg/compress.c, I would like it if dpkg did not
> store timestamps in the gzip files. That way, the creation of the
> data.tar.gz would be deterministic.
> 
> In particular, when I build a binary package with the very same
> contents twice, I see that data.tar.xz and control.tar.gz are both
> different although they have the same contents (even timestamps).
> 
> Binary files one/control.tar.gz and two/control.tar.gz differ
> Binary files one/data.tar.xz and two/data.tar.xz differ
> 
> In 1.16.10 I would add '-n' to the call to 'gzip' in a pipe. In
> curent dpkg git, I believe this may be fixed, but I am not sure. I
> am filing this bug so there is a clear statement of the general
> problem in the BTS, at least!

The same will apply when building that deb package multiple times, the
timestamps will change for the ar headers. And I don't really want to
lose that data, because currently is the only place were the build
time information is recorded. Do you only care that the members
themselves are deterministic or the whole deb package?

I think I'd be fine with not storing the timestamps in the compressed
members themselves, but not about the ar container.

Thanks,
Guillem



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Fri, 16 Aug 2013 08:57:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Guillem Jover <guillem@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Fri, 16 Aug 2013 08:57:04 GMT) Full text and rfc822 format available.

Message #15 received at 719844@bugs.debian.org (full text, mbox):

From: Guillem Jover <guillem@debian.org>
To: Asheesh Laroia <asheesh@asheesh.org>, 719844@bugs.debian.org
Subject: Re: Bug#719844: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Fri, 16 Aug 2013 10:53:02 +0200
Hi!

On Fri, 2013-08-16 at 08:58:02 +0200, Guillem Jover wrote:
> On Thu, 2013-08-15 at 19:42:17 -0400, Asheesh Laroia wrote:
> > Package: dpkg
> > Version: 1.16.10
> > Severity: normal
> 
> > Summary: In lib/dpkg/compress.c, I would like it if dpkg did not
> > store timestamps in the gzip files. That way, the creation of the
> > data.tar.gz would be deterministic.

> > In particular, when I build a binary package with the very same
> > contents twice, I see that data.tar.xz and control.tar.gz are both
> > different although they have the same contents (even timestamps).

> > Binary files one/control.tar.gz and two/control.tar.gz differ
> > Binary files one/data.tar.xz and two/data.tar.xz differ
> > 
> > In 1.16.10 I would add '-n' to the call to 'gzip' in a pipe. In
> > curent dpkg git, I believe this may be fixed, but I am not sure. I
> > am filing this bug so there is a clear statement of the general
> > problem in the BTS, at least!

dpkg has not used the gzip command for a very long long time (prior to
dpkg 1.9.x), and zlib does not initialize the gzip header, so the
timestamp should be 0. If there are differences these should come from
something else, like different tar files fo example.

I can of course add the option to the fallback command code, just out
of correctness, but that will not fix any difference you are currently
seeing.

> The same will apply when building that deb package multiple times, the
> timestamps will change for the ar headers. And I don't really want to
> lose that data, because currently is the only place were the build
> time information is recorded. Do you only care that the members
> themselves are deterministic or the whole deb package?
> 
> I think I'd be fine with not storing the timestamps in the compressed
> members themselves, but not about the ar container.

This still applies, though.

Thanks,
Guillem



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Fri, 16 Aug 2013 12:54:12 GMT) Full text and rfc822 format available.

Acknowledgement sent to Asheesh Laroia <asheesh@asheesh.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Fri, 16 Aug 2013 12:54:12 GMT) Full text and rfc822 format available.

Message #20 received at 719844@bugs.debian.org (full text, mbox):

From: Asheesh Laroia <asheesh@asheesh.org>
To: Guillem Jover <guillem@debian.org>
Cc: 719844@bugs.debian.org
Subject: Re: Bug#719844: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Fri, 16 Aug 2013 08:45:47 -0400 (EDT)
On Fri, 16 Aug 2013, Guillem Jover wrote:

> The same will apply when building that deb package multiple times, the 
> timestamps will change for the ar headers. And I don't really want to 
> lose that data, because currently is the only place were the build time 
> information is recorded. Do you only care that the members themselves 
> are deterministic or the whole deb package?
>
> I think I'd be fine with not storing the timestamps in the compressed 
> members themselves, but not about the ar container.

Thanks for the speedy replies! I'm excited and pleased by your 
responsiveness to the general ideas and specific issues.

I would prefer to keep even the 'ar' container free of those timestamps, 
but there are some workarounds I can imagine if you don't want to do that 
for now.

We could perhaps let 'dpkg' accept an argument to set that timestamp, 
and/or we could use 'faketime' when trying to reproduce a build, if we 
know what timestamp to reproduce. (My notion here is that it should be 
easy for someone to create the same bits that we publish in Debian, even 
if that reproducing effort is slightly different than just doing a fresh 
build.)

Another way that we could do it is to store the last debian/changelog 
timestamp in the ar header. That would be my favorite approach.

-- Asheesh.



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Fri, 16 Aug 2013 13:00:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Asheesh Laroia <asheesh@asheesh.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Fri, 16 Aug 2013 13:00:04 GMT) Full text and rfc822 format available.

Message #25 received at 719844@bugs.debian.org (full text, mbox):

From: Asheesh Laroia <asheesh@asheesh.org>
To: Guillem Jover <guillem@debian.org>
Cc: 719844@bugs.debian.org
Subject: Re: Bug#719844: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Fri, 16 Aug 2013 08:47:36 -0400 (EDT)
On Fri, 16 Aug 2013, Guillem Jover wrote:

> dpkg has not used the gzip command for a very long long time (prior to 
> dpkg 1.9.x), and zlib does not initialize the gzip header, so the 
> timestamp should be 0. If there are differences these should come from 
> something else, like different tar files fo example.
>
> I can of course add the option to the fallback command code, just out of 
> correctness, but that will not fix any difference you are currently 
> seeing.

Interesting. I wonder if the problem is slightly different timestamps in 
the contents of data.tar.gz then, and this is a red herring.

Adding this to the fallback code sounds great, but yeah, I agree it's not 
super related to what I'm seeing. (-:

Thanks!

-- Asheesh.



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Sun, 18 Aug 2013 09:48:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Guillem Jover <guillem@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Sun, 18 Aug 2013 09:48:07 GMT) Full text and rfc822 format available.

Message #30 received at 719844@bugs.debian.org (full text, mbox):

From: Guillem Jover <guillem@debian.org>
To: Asheesh Laroia <asheesh@asheesh.org>, 719844@bugs.debian.org
Subject: Re: Bug#719844: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Sun, 18 Aug 2013 11:45:43 +0200
Hi!

On Fri, 2013-08-16 at 08:45:47 -0400, Asheesh Laroia wrote:
> On Fri, 16 Aug 2013, Guillem Jover wrote:
> >The same will apply when building that deb package multiple times,
> >the timestamps will change for the ar headers. And I don't really
> >want to lose that data, because currently is the only place were
> >the build time information is recorded. Do you only care that the
> >members themselves are deterministic or the whole deb package?
> >
> >I think I'd be fine with not storing the timestamps in the
> >compressed members themselves, but not about the ar container.

> I would prefer to keep even the 'ar' container free of those
> timestamps, but there are some workarounds I can imagine if you
> don't want to do that for now.
> 
> We could perhaps let 'dpkg' accept an argument to set that
> timestamp, and/or we could use 'faketime' when trying to reproduce a
> build, if we know what timestamp to reproduce. (My notion here is
> that it should be easy for someone to create the same bits that we
> publish in Debian, even if that reproducing effort is slightly
> different than just doing a fresh build.)
> 
> Another way that we could do it is to store the last
> debian/changelog timestamp in the ar header. That would be my
> favorite approach.

I've been thinking about this, and I think you might be trying to
solve the problem in the wrong place(s), and possibly there's a need
to step back and ponder about what do you really want out of all this,
to know where or how to best fix it.

The way I see it (and that I think you guys have maybe intermingled)
is that there's different types of changing information:

There's accidental changing information leaks, stuff like usernames,
hostnames, timezone, output or sorting order depending on locale, data
dependant on random input like signatures or similar, etc, which might
just need fixing anyway because they leak information from the builder,
or should ideally be generated on the installed system.

Then there's possibly redundant changing information, like _some_
timestamps on filenames, versions, or even metadata.

And then there's changing information that conveys important data.
For example, in the generated data.tar the files will contain
different modification times, some will come untouched from the source
files if they just get copied, and others will be newer if the files
got created at build time. Preserving these timestamps seems important
to me, because you then know the possible staleness of the files. The
timestamp on the ar member let's you know when the package got built.
Possible future ar members containing gpg signatures from the builder
or the archive, would change and not be reproducible anyway. The recent
switch of dpkg-deb default compressor. Or possible future .deb format
revisions. Etc.

So my question is, what's really the stuff you'd like to know has not
changed. Just file paths, contents and permissions for example. That
seems extremely reasonable w/o dropping possibly important information,
or getting us stuck in possibly old formats and similar. Would there
be something else?

Depending on that, maybe instead of dropping some of the information,
we might just need a way to for example easily compare .deb files,
ignoring unimportant changing data, ala «dpkg-deb --compare a.deb b.deb»
or something along those lines or some other new program.

Thanks,
Guillem



Changed Bug title to 'dpkg-deb: Make compression of {data,control}.tar.gz deterministic' from 'dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process' Request was from Guillem Jover <guillem@debian.org> to control@bugs.debian.org. (Mon, 19 Aug 2013 11:03:04 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Tue, 27 Aug 2013 16:12:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jérémy Bobbio <lunar@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Tue, 27 Aug 2013 16:12:04 GMT) Full text and rfc822 format available.

Message #37 received at 719844@bugs.debian.org (full text, mbox):

From: Jérémy Bobbio <lunar@debian.org>
To: Guillem Jover <guillem@debian.org>, 719844@bugs.debian.org
Subject: Re: Bug#719844: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Tue, 27 Aug 2013 18:08:44 +0200
[Message part 1 (text/plain, inline)]
Hi!

Guillem Jover:
> I've been thinking about this, and I think you might be trying to
> solve the problem in the wrong place(s), and possibly there's a need
> to step back and ponder about what do you really want out of all this,
> to know where or how to best fix it.
> […]

My ideal scenario is the following:

 1. I retrieve the .changes file for a package.
 2. I verify the signature on the .changes.
 3. I give the .changes to a "rebuild" tool.
 4. The checksum of the .deb listed in the original .changes file
    and the checksum of the .deb I've just built should match.

I even would like to compare the rebuilt .deb not only by one source,
but by several.

I would rather avoid to have a `dpkg-deb --compare` as you suggested
because comparing signed checksums is much easier that to transfer
`.deb` all around between multiple independent builders.

> And then there's changing information that conveys important data.
> For example, in the generated data.tar the files will contain
> different modification times, some will come untouched from the source
> files if they just get copied, and others will be newer if the files
> got created at build time. Preserving these timestamps seems important
> to me, because you then know the possible staleness of the files.

I disagree that this is important information. Most packages that I have
seen so far do not propagate timestamps when copying a file from source.
Could you give me an example of one that would do so?

If you are set on this, then our only solution is to generalize the use
of faketime. I found this more clumsy but we already encourage using
fakeroot…

> The timestamp on the ar member let's you know when the package got
> built.

I don't believe this add much value: we have this information in the
.changes files already.

Anyway, I was thinking of adding a `--timestamp=1377619307` option to
`dpkg-deb`. The value would default to `time(NULL)` when the flag is
missing. This could allow the rebuild tool to extract the timestamp from
the .changes file in order to match the initial build. But this has
extra complications given that different calls to dpkg-deb and
dpkg-genchange will have different timestamps at the moment.

> Possible future ar members containing gpg signatures from the builder
> or the archive, would change and not be reproducible anyway. The recent
> switch of dpkg-deb default compressor. Or possible future .deb format
> revisions. Etc.

Could you please elaborate? The idea is to record list of packages that
has been initially able to build the package and to reinstall exactly
the same set of packages (from snapshot.d.o) when performing rebuilds.
I don't really understand how future changes in dpkg would affect
any earlier versions that would get reinstalled to do the rebuild.

-- 
Lunar                                .''`. 
lunar@debian.org                    : :Ⓐ  :  # apt-get install anarchism
                                    `. `'` 
                                      `-   
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Wed, 28 Aug 2013 17:27:10 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jérémy Bobbio <lunar@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Wed, 28 Aug 2013 17:27:10 GMT) Full text and rfc822 format available.

Message #42 received at 719844@bugs.debian.org (full text, mbox):

From: Jérémy Bobbio <lunar@debian.org>
To: 719844@bugs.debian.org
Subject: Re: Bug#719844: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Wed, 28 Aug 2013 19:22:05 +0200
[Message part 1 (text/plain, inline)]
Jérémy Bobbio:
> Guillem Jover:
> > The timestamp on the ar member let's you know when the package got
> > built.
> 
> I don't believe this add much value: we have this information in the
> .changes files already.
> 
> Anyway, I was thinking of adding a `--timestamp=1377619307` option to
> `dpkg-deb`. The value would default to `time(NULL)` when the flag is
> missing. This could allow the rebuild tool to extract the timestamp from
> the .changes file in order to match the initial build. But this has
> extra complications given that different calls to dpkg-deb and
> dpkg-genchange will have different timestamps at the moment.

I have a test implementation of that idea ready. See the attached
patches (extracted from the `pu/reproducible_builds` branch available
from `git://anonscm.debian.org/users/lunar/dpkg.git`).

This makes the following work:

$ apt-get source hello
$ cd hello-2.8
$ dpkg-buildpackage
[…]
$ cp ../hello_2.8-4_amd64.deb ../hello_2.8-4_amd64.deb.orig
$ DEB_BUILD_TIMESTAMP=$(date +%s -d"$(sed -n -e 's/^Date: //p' ../hello_2.8-4_amd64.changes)") dpkg-buildpackage
[…]
$ sha256sum ../hello_2.8-4_amd64.deb ../hello_2.8-4_amd64.deb.orig
1e944abfceac7e593f6706da971e0444e5cee9aab680de5292d52661940ee9c4 ../hello_2.8-4_amd64.deb
1e944abfceac7e593f6706da971e0444e5cee9aab680de5292d52661940ee9c4 ../hello_2.8-4_amd64.deb.orig

You probably will have several comments regarding the implementation,
but I was wondering how you felt with the overall approach.

-- 
Lunar                                .''`. 
lunar@debian.org                    : :Ⓐ  :  # apt-get install anarchism
                                    `. `'` 
                                      `-   
[0001-Use-a-single-timestamp-when-building-a-.deb.patch (text/x-diff, attachment)]
[0002-dpkg-buildpackage-produce-the-same-timestamp-in-.deb.patch (text/x-diff, attachment)]
[0003-Allow-to-preset-dpkg-buildpackage-timestamp.patch (text/x-diff, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Sat, 31 Aug 2013 19:00:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jonathan Nieder <jrnieder@gmail.com>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Sat, 31 Aug 2013 19:00:04 GMT) Full text and rfc822 format available.

Message #47 received at 719844@bugs.debian.org (full text, mbox):

From: Jonathan Nieder <jrnieder@gmail.com>
To: Jérémy Bobbio <lunar@debian.org>
Cc: 719844@bugs.debian.org
Subject: Re: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Sat, 31 Aug 2013 11:57:14 -0700
Hi,

Jérémy Bobbio wrote:
> Guillem Jover:

>> For example, in the generated data.tar the files will contain
>> different modification times, some will come untouched from the source
>> files if they just get copied, and others will be newer if the files
>> got created at build time. Preserving these timestamps seems important
>> to me, because you then know the possible staleness of the files.
>
> I disagree that this is important information. Most packages that I have
> seen so far do not propagate timestamps when copying a file from source.
> Could you give me an example of one that would do so?

See http://www.debian.org/doc/debian-policy/ch-source.html#s-timestamps

[...]
>> Possible future ar members containing gpg signatures from the builder
>> or the archive, would change and not be reproducible anyway. The recent
>> switch of dpkg-deb default compressor. Or possible future .deb format
>> revisions. Etc.
>
> Could you please elaborate? The idea is to record list of packages that
> has been initially able to build the package and to reinstall exactly
> the same set of packages (from snapshot.d.o) when performing rebuilds.

That can be problematic when packages used during the build get
security fixes, fixes for new hardware support, and so on.  Not to
mention sources of randomness during the build, such as parallelism.

But I don't want to discourage you --- it's an interesting project,
even if I think it's a doomed one. ;-)

Thanks,
Jonathan



Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Sat, 31 Aug 2013 19:24:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jérémy Bobbio <lunar@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Sat, 31 Aug 2013 19:24:04 GMT) Full text and rfc822 format available.

Message #52 received at 719844@bugs.debian.org (full text, mbox):

From: Jérémy Bobbio <lunar@debian.org>
To: Jonathan Nieder <jrnieder@gmail.com>, 719844@bugs.debian.org
Subject: Re: Bug#719844: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Sat, 31 Aug 2013 21:21:29 +0200
[Message part 1 (text/plain, inline)]
Jonathan Nieder:
> > I disagree that this is important information. Most packages that I have
> > seen so far do not propagate timestamps when copying a file from source.
> > Could you give me an example of one that would do so?
>
> See http://www.debian.org/doc/debian-policy/ch-source.html#s-timestamps

This got out of my mind. Thanks for the pointer.

I still would like to be pointed at a package that does this, though.

> > The idea is to record list of packages that has been initially able
> > to build the package and to reinstall exactly the same set of
> > packages (from snapshot.d.o) when performing rebuilds.
>
> That can be problematic when packages used during the build get
> security fixes, fixes for new hardware support, and so on.

Problematic how? If it did build once, it should be rebuildable.

> Not to mention sources of randomness during the build, such as
> parallelism.

A build system that produces an output that varies depending on the
order of its computations should be fixed. Otherwise, it sounds like a
great source of heisenbugs and other pains.

-- 
Lunar                                .''`. 
lunar@debian.org                    : :Ⓐ  :  # apt-get install anarchism
                                    `. `'` 
                                      `-   
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Dpkg Developers <debian-dpkg@lists.debian.org>:
Bug#719844; Package dpkg. (Sun, 01 Sep 2013 09:42:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jérémy Bobbio <lunar@debian.org>:
Extra info received and forwarded to list. Copy sent to Dpkg Developers <debian-dpkg@lists.debian.org>. (Sun, 01 Sep 2013 09:42:04 GMT) Full text and rfc822 format available.

Message #57 received at 719844@bugs.debian.org (full text, mbox):

From: Jérémy Bobbio <lunar@debian.org>
To: 719844@bugs.debian.org
Subject: Re: Bug#719844: dpkg-source: Make compresing of {data,control}.tar.gz a deterministic process
Date: Sun, 1 Sep 2013 11:38:50 +0200
[Message part 1 (text/plain, inline)]
Jérémy Bobbio:
> Jonathan Nieder:
> > > I disagree that this is important information. Most packages that I have
> > > seen so far do not propagate timestamps when copying a file from source.
> > > Could you give me an example of one that would do so?
> >
> > See http://www.debian.org/doc/debian-policy/ch-source.html#s-timestamps
> 
> This got out of my mind. Thanks for the pointer.

If dpkg is getting its own tar implementation, I think the following
could work:

If we record the time when the build starts, we know that any files
created after that time are newly created files. So we can use a unique
timestamp for all of them.

(This could be done manually by touch'ing the files.)

Together with the approach mentioned previously (using
DEB_BUILD_TIMESTAMP), this could allow rebuilds with the aforementioned
unique timestamp preset to the date of the initial build.

-- 
Lunar                                .''`. 
lunar@debian.org                    : :Ⓐ  :  # apt-get install anarchism
                                    `. `'` 
                                      `-   
[signature.asc (application/pgp-signature, inline)]

Added tag(s) pending. Request was from Guillem Jover <guillem@debian.org> to control@bugs.debian.org. (Thu, 17 Oct 2013 06:27:14 GMT) Full text and rfc822 format available.

Message sent on to Asheesh Laroia <asheesh@asheesh.org>:
Bug#719844. (Thu, 17 Oct 2013 06:27:27 GMT) Full text and rfc822 format available.

Message #62 received at 719844-submitter@bugs.debian.org (full text, mbox):

From: Guillem Jover <guillem@debian.org>
To: 719844-submitter@bugs.debian.org
Subject: Bug#719844 marked as pending
Date: Thu, 17 Oct 2013 06:23:05 +0000
tag 719844 pending
thanks

Hello,

Bug #719844 reported by you has been fixed in the Git repository. You can
see the changelog below, and you can check the diff of the fix at:

    http://git.debian.org/?p=dpkg/dpkg.git;a=commitdiff;h=bd58cab

---
commit bd58cab620d35bd34021578c97904921cdca45bd
Author: Guillem Jover <guillem@debian.org>
Date:   Sun Aug 18 11:49:42 2013 +0200

    libdpkg: Do not store timestamps in gzip headers when using the command
    
    The zlib library by default does not initialize the gzip header with
    information like OS, filename or timestamp. Try to do the same when
    using the gzip command, although there's no way to tell the command
    not to store the OS.
    
    Closes: #719844

diff --git a/debian/changelog b/debian/changelog
index 27124f2..8e1e542 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -54,6 +54,9 @@ dpkg (1.17.2) UNRELEASED; urgency=low
   * Improve dpkg “Preparing to replace” and “Unpacking” progress messages.
     Closes: #32427, #71106
   * Print the package version on main dpkg progress messages.
+  * Do not store timestamps in gzip headers when using the command, to try to
+    mimic the zlib behavior. This does not affect Debian as it's been using
+    zlib for a very long time. Closes: #719844
 
   [ Updated programs translations ]
   * German (Sven Joachim).



Reply sent to Guillem Jover <guillem@debian.org>:
You have taken responsibility. (Thu, 05 Dec 2013 06:07:12 GMT) Full text and rfc822 format available.

Notification sent to Asheesh Laroia <asheesh@asheesh.org>:
Bug acknowledged by developer. (Thu, 05 Dec 2013 06:07:12 GMT) Full text and rfc822 format available.

Message #67 received at 719844-close@bugs.debian.org (full text, mbox):

From: Guillem Jover <guillem@debian.org>
To: 719844-close@bugs.debian.org
Subject: Bug#719844: fixed in dpkg 1.17.2
Date: Thu, 05 Dec 2013 06:03:48 +0000
Source: dpkg
Source-Version: 1.17.2

We believe that the bug you reported is fixed in the latest version of
dpkg, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 719844@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Guillem Jover <guillem@debian.org> (supplier of updated dpkg package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Thu, 05 Dec 2013 04:56:31 +0100
Source: dpkg
Binary: libdpkg-dev dpkg dpkg-dev libdpkg-perl dselect
Architecture: source amd64 all
Version: 1.17.2
Distribution: unstable
Urgency: low
Maintainer: Dpkg Developers <debian-dpkg@lists.debian.org>
Changed-By: Guillem Jover <guillem@debian.org>
Description: 
 dpkg       - Debian package management system
 dpkg-dev   - Debian package development tools
 dselect    - Debian package management front-end
 libdpkg-dev - Debian package management static library
 libdpkg-perl - Dpkg perl modules
Closes: 32427 71106 143307 187019 257505 583585 615813 661538 667008 681370 717983 718437 718541 718899 718945 719418 719746 719844 720712 725437 726112 726932 729874
Changes: 
 dpkg (1.17.2) unstable; urgency=low
 .
   [ Guillem Jover ]
   * Make Dpkg::Arch debwildcard_to_debtriplet() more robust by matching
     on exact 'any' strings, instead of substrings.
   * Add manpages-it Replaces to dselect and dpkg-dev. Closes: #717983
     Reported by Andreas Beckmann <anbe@debian.org>.
   * Document default dpkg-deb compressor change to xz in man page.
     Thanks to Salvatore Bonaccorso <carnil@debian.org>. Closes: #718437
   * Version manpages-it in Replaces with (<< 2.80-4), now that the package
     does not ship the overlapping paths any more.
   * Automatically prepend needed spaces for continuation --force-help lines.
   * Be more precise on deb format errors with data member in dpkg-deb.
   * Do not allow deb packages with control and data members swapped.
   * Clarify «dpkg-deb --extract» bad usage error message on missing arguments
     by printing all required arguments at once. Closes: #718899
   * Clarify the insertion order of _ members in deb(5) man page.
   * Fix use after free in alternative_parse_fileset() on update-alternatives.
     Reported by Pedro Ribeiro <pedrib@gmail.com>.
   * Fix use after free in dpkg_arch_load_list() on libdpkg.
     Reported by Pedro Ribeiro <pedrib@gmail.com>.
   * Fix theoretical stack buffer overflow in w_dependency() on libdpkg, not
     currently applicable. Reported by Pedro Ribeiro <pedrib@gmail.com>.
   * Add ppc64el support to cputable. Closes: #718945
     Thanks to Jeff Bailey <jeffbailey@google.com>.
   * Use dpkg-gencontrol -c argument as a fallback lock file in case
     debian/control does not exist. Closes: #667008
   * Pass the package reference count (i.e. number of present instances) to
     maintainer scripts via the new variable DPKG_MAINTSCRIPT_PACKAGE_REFCOUNT.
     Closes: #681370
   * Fix field names on error messages in libdpkg, by either capitalizing them
     or by renaming them to match reality.
   * Do not capitalize error and warning messages.
   * When ignoring invalid remove requests in dpkg consider that progress,
     reset the loop detector and avoid the assert. Closes: #143307
   * Activate all path components for file triggers on removal.
     Regression introduced in 1.17.0. Closes: #725437
   * Activate file triggers on disappearance more accurately, only when we know
     we are inevitably removing things.
   * Fix «dpkg-query --list» output when using multibyte character strings,
     to avoid unaligned columns and mojibake. Closes: #257505, #718541
     Based on a patch by Changwoo Ryu <cwryu@debian.org>.
   * Use fully buffered output on non-tty stdout.
     Reported by Shawn Landden <shawnlandden@gmail.com>.
   * Recognize «start-stop-daemon -C» as documented. Closes: #719746
     Reported by Brian S. Julin <bri@abrij.org>.
   * When update-alternatives is told to change slave links, do not warn that
     the link group is broken, just print a notice that the alternative is
     being updated due to the changes.
   * Add a new «dpkg --verify» command to check the integrity of packages
     installed files. Add a --verify-format option to excplicitly select the
     output format, currently only rpm compatible output is supported, but
     the default might change in the future. Closes: #187019
   * Improve dpkg “Preparing to replace” and “Unpacking” progress messages.
     Closes: #32427, #71106
   * Print the package version on main dpkg progress messages.
   * Do not store timestamps in gzip headers when using the command, to try to
     mimic the zlib behavior. This does not affect Debian as it's been using
     zlib for a very long time. Closes: #719844
   * Reset environment variables affecting compressor commands when not using
     the shared library implementations. Namely XZ_DEFAULTS, XZ_OPT, BZIP and
     BZIP2.
   * Use a simple list to track packages owning a file, instead of using a
     list of arrays of pointers which waste 10 pointers per non-shared file,
     instead of 1. This significantly reduces dpkg memory usage.
   * Honor new DEB_SIGN_KEYID environment variable in dpkg-buildpackage.
     Suggested by Harald Dunkel <harri@afaics.de>. Closes: #615813, #719418
   * Always check subprocess exit codes in Dpkg::Source::Package modules.
     Reported by Ian Jackson <ijackson@chiark.greenend.org.uk>.
   * Add support for pie and stack-protector options to dpkg-buildflags FFLAGS,
     and update the man page to mention FFLAGS are a subset of CFLAGS.
     Closes: #726932
   * Improve and unify -O option handling in dpkg-genchanges, dpkg-gensymbols
     and dpkg-shlibdeps, by always taking an optional filename argument and
     describing in the man page the default output files.
   * Use “hyphen” instead of “dash” when we mean the ‘-’ character in the
     documentation and code comments.
   * Do not NULL-terminate the list in the compat scandir(), as this might
     cause a segfault in case the function returns 0 entries.
   * Always return from ensure_statoverrides() if file is NULL, otherwise
     we might get us to read garbage from memory or segfault.
   * Add new symlink_to_dir command to dpkg-maintscript-helper. Closes: #720712
     Based on a patch by Bastien ROUCARIÈS <roucaries.bastien@gmail.com>.
   * Add new dir_to_symlink command to dpkg-maintscript-helper. Closes: #583585
   * Distinguish dpkg error reports between errors while processing packages
     and archives.
   * Fix crashes in the first call to gettext() after fork() on Mac OS X, by
     forcing the initialization at program start of the CoreFoundation cached
     values in libintl.
   * Set a default gettext domain for libdpkg code, so that other programs
     using a different domain can still get correct translations, like dselect.
   * Cleanup libdpkg-perl API:
     - Dpkg::Compression: Deprecate $default_compression_level,
       $default_compression and $compression_re_file_ext package variables.
     - Dpkg::Exit: Deprecate @handlers package variable.
     - Dpkg::Source::Package: Deprecate $diff_ignore_default_regexp and
       @tar_ignore_default_pattern package variables.
     - Dpkg::Changelog::Entry::Debian: Deprecate $regex_header and
       $regex_trailer package variables.
   * Add GnuPG 2.x support. Add gnupg2 and gpgv2 as alternative Recommends to
     gnupg and gpgv (to not pull them by default), but prefer gpgv2 over gpgv,
     and gpg2 over gpg at run-time if they are available.
   * Switch dpkg conflictor tracking from a fixed-size array to a queue,
     fixing several related issues, due to conflictors not being removed from
     the array after processing them. dpkg could fill it due to additions in
     previous package processing producing very confusing error messages; and
     a theoretical problem where a package could get appended to be removed,
     then reinstalled as a new version, to get removed again when revisiting
     the array in a subsequent package processing. Closes: #726112
   * Do not accept empty field names in dpkg.
   * Do not accept an initial hyphen in field names.
   * Add experimental build profiles support:
     - Add support for <!profile.name> build-time restrictions in dependencies.
     - Add support for DEB_BUILD_PROFILES environment variable.
     - Add new option -P to dpkg-buildpackage and dpkg-checbuilddeps.
     - Add new Built-For-Profiles output field in .deb and .changes files.
     Based on a patch by Patrick "P. J." McDermott <pjm@nac.net>,
     Wookey <wookey@debian.org> and Johannes Schauer <j.schauer@email.de>.
     Closes: #661538
   * Bump Standards-Version to 3.9.5.
   * Document interactions of dpkg-source --extend-diff-ignore and -i in the
     man page. Closes: #729874
 .
   [ Updated programs translations ]
   * German (Sven Joachim).
   * Vietnamese (Trần Ngọc Quân).
 .
   [ Updated scripts translations ]
   * German (Helge Kreutzmann).
 .
   [ Updated manpages translations ]
   * French (Christian Perrier): fix incorrectly translated sentence,
     thanks to Fabien Givors.
   * German (Helge Kreutzmann).
Checksums-Sha1: 
 c4f7042c038c3174f6f170e359d59e167788fe03 2005 dpkg_1.17.2.dsc
 ef6d6e3dba5be41b86e1befe1cf86b51adaaaf36 3829700 dpkg_1.17.2.tar.xz
 df8f5ee146fad7af246f627e7f9a338e89e219f8 735532 libdpkg-dev_1.17.2_amd64.deb
 d66336876dd4aac3caa2d2f27a5742f40dc48d4c 2599582 dpkg_1.17.2_amd64.deb
 5391ca8e435ac4c9a9e5e3f4f599e6d0ddfa0ff2 995850 dselect_1.17.2_amd64.deb
 b2ba52087ebbac112b4a2c63700e79e5d91928cb 1376728 dpkg-dev_1.17.2_all.deb
 25356eba113913a90863be668cdf24f2625896a0 905766 libdpkg-perl_1.17.2_all.deb
Checksums-Sha256: 
 c99d474673f45a85156dcdc3173d899ef6decbfead599b87026000770538239f 2005 dpkg_1.17.2.dsc
 0a1c2b4d7a5a485b53263448c0a6f111ca4fe5d774cddf0abd7ce02758db8650 3829700 dpkg_1.17.2.tar.xz
 6812d5ea71c7ba237452589eb81a14ab10fd16e456ebc761063b20c71f61cd35 735532 libdpkg-dev_1.17.2_amd64.deb
 1f667dad1a22f215c762efaaf2d4dc2c3db0c5c61f1bc6982785a6a9b3edc4b9 2599582 dpkg_1.17.2_amd64.deb
 f5cd2208024a044b2c40f3e78ea592f0da312203524d282b79c6f0113ba4e103 995850 dselect_1.17.2_amd64.deb
 ba5198eec4b67ba616197493fc7a143c2a0781910436b49c13003350fae72788 1376728 dpkg-dev_1.17.2_all.deb
 b660b588f508c5a066dd6d995c805bf8896085bb182280343400bae5198b1f25 905766 libdpkg-perl_1.17.2_all.deb
Files: 
 c53e4ebc6dbcde14397a060c1357e569 2005 admin required dpkg_1.17.2.dsc
 35d42dac927d31152a22f50637994d51 3829700 admin required dpkg_1.17.2.tar.xz
 aa991c285663d59e84fa8642bd548ff3 735532 libdevel optional libdpkg-dev_1.17.2_amd64.deb
 3ed7cbafd833e43648984871bb090d05 2599582 admin required dpkg_1.17.2_amd64.deb
 411f6bda343a2df4e7d630b03b99140a 995850 admin optional dselect_1.17.2_amd64.deb
 6c36f5b1450d965afe038b5577730c44 1376728 utils optional dpkg-dev_1.17.2_all.deb
 28900795bce285bf30a5f2f917a271b9 905766 perl optional libdpkg-perl_1.17.2_all.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)

iQIcBAEBCAAGBQJSoAcoAAoJELlyvz6krlejSNwQAOSerUBsifpUSSPK0rc4YB4E
knkRqIcEEiWmjYKddLHS58fy7h6AXPz/OO55+iiN093eDSaDF1KSM4HT7k2Ev8sq
r7n4MNCLHyVajM/mnvIYnihbm/jJZmHkGmBMLmirsJTY7mpopLE+BRI8QFXb0fiJ
oawlPspvQcs5bFvZUiUxuSyWGhuYs+TIP+3+rMEX0Xe4+//N577lmO8GEUOnBMVf
7jqxn02yAvqE/pIv208WFWW+Go3B/7jhudOeDBPoHIf3JyxMQhSkFEccSYUM+yXB
mvNOi44hrhG9w7J1tQEuMHXqnPmSIp87bcoqTCFGjuDqRJNHM/fTCJuYsYSFbNDl
MlxLqN6Oq0AmWEiakrPYHBgJR5LZDhCMohOlbp4D7GW3wtjz4Z4uddXq4xL8aItE
o+rl44J86iOqwBCkZ/iQYFaIirjosl5souso3AOsg2igumrkdB0y1q1HeKUjtXma
k4byEj0FEpwh03HAbABQv3LDm4UKwL7QF9D8gDZ5emCGH9JbuHFeO+N/6yB9LS+A
rHa1bedUbfLoBiWtBx+eTGfSCLBj0U9mI2MuEusItvR9xsC+KuCnt4sIRwpwuymt
GF8yqMw0FMSDa6Aavcn0U9EPYS+vPtSiV23CKljP5kI8DHSVxUGbto3pkfbtGqtg
sw0ZOWT/NClNjVfpezTY
=5lAo
-----END PGP SIGNATURE-----




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Fri, 03 Jan 2014 07:27:53 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Apr 24 23:02:06 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.