Debian Bug report logs - #627179
multistrap: Using retainsources=dir does not retain some sources

version graph

Package: multistrap; Maintainer for multistrap is Neil Williams <codehelp@debian.org>; Source for multistrap is src:multistrap.

Reported by: David Kuehling <dvdkhlng@gmx.de>

Date: Wed, 18 May 2011 13:12:02 UTC

Severity: normal

Tags: moreinfo

Fixed in version multistrap/2.1.15

Done: Neil Williams <codehelp@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, dvdkhlng@gmx.de, Neil Williams <codehelp@debian.org>:
Bug#627179; Package multistrap. (Wed, 18 May 2011 13:12:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kuehling <dvdkhlng@gmx.de>:
New Bug report received and forwarded. Copy sent to dvdkhlng@gmx.de, Neil Williams <codehelp@debian.org>. (Wed, 18 May 2011 13:12:06 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: David Kuehling <dvdkhlng@gmx.de>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: multistrap: Using retainsources=dir does not retain some sources
Date: Wed, 18 May 2011 15:09:44 +0200
[Message part 1 (text/plain, inline)]
Package: multistrap
Severity: normal

looks like multistrap 2.1.13 (from svn) is incapable of downloading
source for some packages.  For example source-code for bash is missing
when used with Ubuntu Lucid repositories. 

The problem seems to be caused by multistrap querying the 'Source' field
of the corresponding binary packages to see what source-packages to
get.  Some packages do not have any 'Source' field, in which case the
binary Package name ought to be used.

i.e. multistrap line 527:

		my $src=`LC_ALL=C dpkg -f ./${cachedir}archives/$deb Source`;

should fall back to using $deb if $src is not set.

cheers,

David
-- 
GnuPG public key: http://user.cs.tu-berlin.de/~dvdkhlng/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Neil Williams <codehelp@debian.org>:
Bug#627179; Package multistrap. (Mon, 23 May 2011 11:42:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kuehling <dvdkhlng@gmx.de>:
Extra info received and forwarded to list. Copy sent to Neil Williams <codehelp@debian.org>. (Mon, 23 May 2011 11:42:03 GMT) Full text and rfc822 format available.

Message #10 received at 627179@bugs.debian.org (full text, mbox):

From: David Kuehling <dvdkhlng@gmx.de>
To: 627179@bugs.debian.org
Cc: debian-embedded@lists.debian.org
Subject: [PATCH] fix #627179 (multistrap misses some source packages)
Date: Mon, 23 May 2011 13:40:05 +0200
[Message part 1 (text/plain, inline)]
Hi,

the attached patch (mostly) fixes bug #627179 [1].  Patch is against
multistrap SVN head [2].

The patch misses one occurence of the bug, when multistrap looks at
var/lib/dpkg/status looking for Source: headers only (ignoring Version:
and Package:).  Fixing that feels like beyond my perl skills, and I'm
not sure that this whole part is required anyways (it's redandant with
checking the downloaded .debs).  For now I put a big Todo: comment on
top.

That said, for me the patch fixes the problem with missing sources for
the multistrap.conf I test with.  

The patch also fixes another bug, not yet reported: multistrap could
have fetched source packages versions that differ from the binary
package versions.

cheers,

David

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=627179
[2] http://www.emdebian.org/svn/current/host/trunk/multistrap/trunk
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40

[fix-627179-retainsources.patch (text/x-diff, attachment)]
[Message part 3 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#627179; Package multistrap. (Sat, 11 Jun 2011 19:09:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Neil Williams <codehelp@debian.org>:
Extra info received and forwarded to list. (Sat, 11 Jun 2011 19:09:03 GMT) Full text and rfc822 format available.

Message #15 received at 627179@bugs.debian.org (full text, mbox):

From: Neil Williams <codehelp@debian.org>
To: dvdkhlng@gmx.de, 627179@bugs.debian.org
Cc: control@bugs.debian.org
Subject: Re: Bug#627179: multistrap: Using retainsources=dir does not retain some sources
Date: Sat, 11 Jun 2011 20:06:54 +0100
[Message part 1 (text/plain, inline)]
tag 627179 + moreinfo
tag 627179 - patch
quit

On Wed, 18 May 2011 15:09:44 +0200
David Kuehling <dvdkhlng@gmx.de> wrote:

> the attached patch (mostly) fixes bug #627179 [1].  Patch is against
> multistrap SVN head [2].

The patch looks interesting but incomplete and possibly misleading.
 
> The patch misses one occurence of the bug, when multistrap looks at
> var/lib/dpkg/status looking for Source: headers only (ignoring Version:
> and Package:).  Fixing that feels like beyond my perl skills, and I'm
> not sure that this whole part is required anyways (it's redandant with
> checking the downloaded .debs).  For now I put a big Todo: comment on
> top.

Think about this more carefully. The situation is that multistrap is
stateless and something can have happened which means that the run when
the packages are actually downloaded failed at a later stage (e.g. in
the hooks or setupscript) and then got fixed. So a later run of
multistrap still needs to go through the status file (because the .debs
have been unpacked and deleted) to check if some source packages still
need to be downloaded. apt-get install will check the status file and
report that it the packages are already at the newest version, without
downloading anything, so the list has to come from somewhere else. i.e.
the list of downloaded debs is untrustworthy and must be regarded as
incomplete.

> That said, for me the patch fixes the problem with missing sources for
> the multistrap.conf I test with.  

More testing required. I hope to get some time to look at this soon but
it needs a lot more thought.
 
> The patch also fixes another bug, not yet reported: multistrap could
> have fetched source packages versions that differ from the binary
> package versions.

That is more about differences in aptsources and debootstrap lines than
anything to do with specifying the version. I don't think your patch
actually works here. apt-get source will get the latest, just as
apt-get install will get the latest. What changes is whether the call
is made when aptsources are active or when bootstrap sources are
active. It needs to be bootstrap sources. I'd need to have a real
example of where apt-get install will download a different version to
what apt-get source will download for the same sources - that would be
a bug in apt, not multistrap. (Multistrap creates deb-src lines for
each source specified, so the versions are expected to be the same from
deb to deb-src or else there are problems with the archive.)

-- 


Neil Williams
=============
http://www.linux.codehelp.co.uk/

[Message part 2 (application/pgp-signature, inline)]

Added tag(s) moreinfo. Request was from Neil Williams <codehelp@debian.org> to control@bugs.debian.org. (Sat, 11 Jun 2011 19:09:05 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Neil Williams <codehelp@debian.org>:
Bug#627179; Package multistrap. (Wed, 15 Jun 2011 15:54:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kuehling <dvdkhlng@gmx.de>:
Extra info received and forwarded to list. Copy sent to Neil Williams <codehelp@debian.org>. (Wed, 15 Jun 2011 15:54:04 GMT) Full text and rfc822 format available.

Message #22 received at 627179@bugs.debian.org (full text, mbox):

From: David Kuehling <dvdkhlng@gmx.de>
To: Neil Williams <codehelp@debian.org>
Cc: 627179@bugs.debian.org
Subject: Re: Bug#627179: multistrap: Using retainsources=dir does not retain some sources
Date: Wed, 15 Jun 2011 17:49:50 +0200
[Message part 1 (text/plain, inline)]
>>>>> "Neil" == Neil Williams <codehelp@debian.org> writes:

> On Wed, 18 May 2011 15:09:44 +0200
> David Kuehling <dvdkhlng@gmx.de> wrote:

> the attached patch (mostly) fixes bug #627179 [1].  Patch is against
>> multistrap SVN head [2].

> The patch looks interesting but incomplete and possibly misleading.

I understand that it's incomplete, but I do not think it is more
'misleading' than the code that it attempts to fix :)
 
>> The patch misses one occurence of the bug, when multistrap looks at
>> var/lib/dpkg/status looking for Source: headers only (ignoring
>> Version: and Package:).  Fixing that feels like beyond my perl
>> skills, and I'm not sure that this whole part is required anyways
>> (it's redandant with checking the downloaded .debs).  For now I put a
>> big Todo: comment on top.

> Think about this more carefully. The situation is that multistrap is
> stateless and something can have happened which means that the run
> when the packages are actually downloaded failed at a later stage
> (e.g. in the hooks or setupscript) and then got fixed. So a later run
> of multistrap still needs to go through the status file (because the
> .debs have been unpacked and deleted) to check if some source packages
> still need to be downloaded. apt-get install will check the status
> file and report that it the packages are already at the newest
> version, without downloading anything, so the list has to come from
> somewhere else. i.e.  the list of downloaded debs is untrustworthy and
> must be regarded as incomplete.

Ok, if this is the case, then why do we have to collect source packages
(dsclist) at 3 places in multistrap.conf .  Won't it be sufficient to
just do it once, when parsing the status file?

>> That said, for me the patch fixes the problem with missing sources
>> for the multistrap.conf I test with.

> More testing required. I hope to get some time to look at this soon
> but it needs a lot more thought.

I'm willing to invest the time to fix it, everything is better than
maintaining my own version of debian stuff.
 
>> The patch also fixes another bug, not yet reported: multistrap could
>> have fetched source packages versions that differ from the binary
>> package versions.

> That is more about differences in aptsources and debootstrap lines
> than anything to do with specifying the version. I don't think your
> patch actually works here. apt-get source will get the latest, just as
> apt-get install will get the latest. What changes is whether the call
> is made when aptsources are active or when bootstrap sources are
> active. It needs to be bootstrap sources. I'd need to have a real
> example of where apt-get install will download a different version to
> what apt-get source will download for the same sources - that would be
> a bug in apt, not multistrap. (Multistrap creates deb-src lines for
> each source specified, so the versions are expected to be the same
> from deb to deb-src or else there are problems with the archive.)

That's exactly the problem: inconsistent versions in the archive or
archive updates while multistrap runs.  With the current implementation
those won't be detected.  IMO this is a severe error that can cause
commercial distributors of images real pain due to the resulting GPL
violation.

So what work needs to be done for the patch to be accepted?  

  - Drop the explicit versioning of source packages?

  - Fix the parsing of var/lib/dpkg/status in tidy_apt to use
    package-name in case that Source: is not present

  - what else did I miss?

cheers,

David
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205 D016 7DEF 5323 C174 7D40
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#627179; Package multistrap. (Wed, 15 Jun 2011 20:57:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Neil Williams <codehelp@debian.org>:
Extra info received and forwarded to list. (Wed, 15 Jun 2011 20:57:03 GMT) Full text and rfc822 format available.

Message #27 received at 627179@bugs.debian.org (full text, mbox):

From: Neil Williams <codehelp@debian.org>
To: dvdkhlng@gmx.de
Cc: 627179@bugs.debian.org
Subject: Re: Bug#627179: multistrap: Using retainsources=dir does not retain some sources
Date: Wed, 15 Jun 2011 21:52:17 +0100
[Message part 1 (text/plain, inline)]
On Wed, 15 Jun 2011 17:49:50 +0200
David Kuehling <dvdkhlng@gmx.de> wrote:

> > Neil Williams <codehelp@debian.org> writes:
> > On Wed, 18 May 2011 15:09:44 +0200
> > David Kuehling <dvdkhlng@gmx.de> wrote:
> 
> > Think about this more carefully. The situation is that multistrap is
> > stateless and something can have happened which means that the run
> > when the packages are actually downloaded failed at a later stage
> > (e.g. in the hooks or setupscript) and then got fixed. So a later run
> > of multistrap still needs to go through the status file (because the
> > .debs have been unpacked and deleted) to check if some source packages
> > still need to be downloaded. apt-get install will check the status
> > file and report that it the packages are already at the newest
> > version, without downloading anything, so the list has to come from
> > somewhere else. i.e.  the list of downloaded debs is untrustworthy and
> > must be regarded as incomplete.
> 
> Ok, if this is the case, then why do we have to collect source packages
> (dsclist) at 3 places in multistrap.conf .  Won't it be sufficient to
> just do it once, when parsing the status file?

Error handling is the main reason. Unpacking might fail, the archives
might be the only source of data.

Could you have a look at the current SVN revision and let me know how
that matches your tests?

 > >> The patch also fixes another bug, not yet reported: multistrap
could
> >> have fetched source packages versions that differ from the binary
> >> package versions.
> 
> > That is more about differences in aptsources and debootstrap lines
> > than anything to do with specifying the version. I don't think your
> > patch actually works here. apt-get source will get the latest, just as
> > apt-get install will get the latest. What changes is whether the call
> > is made when aptsources are active or when bootstrap sources are
> > active. It needs to be bootstrap sources. I'd need to have a real
> > example of where apt-get install will download a different version to
> > what apt-get source will download for the same sources - that would be
> > a bug in apt, not multistrap. (Multistrap creates deb-src lines for
> > each source specified, so the versions are expected to be the same
> > from deb to deb-src or else there are problems with the archive.)
> 
> That's exactly the problem: inconsistent versions in the archive or
> archive updates while multistrap runs. 

That sounds like a broken archive. 

I haven't implemented the versioned source call - I remain unconvinced
that a valid archive would cause the download of a source package of a
different version to the binary package.

-- 


Neil Williams
=============
http://www.linux.codehelp.co.uk/

[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Neil Williams <codehelp@debian.org>:
Bug#627179; Package multistrap. (Wed, 15 Jun 2011 21:24:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kuehling <dvdkhlng@gmx.de>:
Extra info received and forwarded to list. Copy sent to Neil Williams <codehelp@debian.org>. (Wed, 15 Jun 2011 21:24:04 GMT) Full text and rfc822 format available.

Message #32 received at 627179@bugs.debian.org (full text, mbox):

From: David Kuehling <dvdkhlng@gmx.de>
To: Neil Williams <codehelp@debian.org>
Cc: 627179@bugs.debian.org
Subject: Re: Bug#627179: multistrap: Using retainsources=dir does not retain some sources
Date: Wed, 15 Jun 2011 23:22:25 +0200
[Message part 1 (text/plain, inline)]
>>>>> "Neil" == Neil Williams <codehelp@debian.org> writes:

> Error handling is the main reason. Unpacking might fail, the archives
> might be the only source of data.

> Could you have a look at the current SVN revision and let me know how
> that matches your tests?

I'm going to have a look at it and test this stuff on friday.  

>> That's exactly the problem: inconsistent versions in the archive or
>> archive updates while multistrap runs.

> That sounds like a broken archive.

> I haven't implemented the versioned source call - I remain unconvinced
> that a valid archive would cause the download of a source package of a
> different version to the binary package.

Are you assuming a non-changing archive?  As soon as the archive changes
non-atomically (with locking applied by the client) I think we're
doomed.  As we're building images from debian sid, changes will be
pretty common.

I'm not sure how many times 'multistrap' performs 'apt-get update'.
Even if it only does it once, the source and binary package indices are
distinct files, and they are retrieved by distinct transaction from the
ftp/http server, so I see no way that you can guarantee consistency
during mirror pushes.

The only atomicity you have is for updating a single index file via
'mv'.

The following scenario: 

 * mirror gets updated, probably first new files put into the pool, then
   the new indices follow, then even later, it's going to delete the
   files no longer referenced by the indices.

 * Concurrently I run 'multistrap' which runs 'apt-get update', fetching
   dists/sid/main/binary-amd64/Packages.bz2 and
   sid/main/source/Sources.bz2 .
 
 * Now I get Packages.bz2 from before the mirror update, and Sources.bz2
   from after the mirror update.  Sources.bz2 is going to have some
   packages updated to newer versions, and won't correspond 100% to
   binaries from Packages.bz2, thus violating the GPL

Of course I guess the error rate will not be too high, at least not
over a normal high-rate internet connection.  

cheers,

David
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#627179; Package multistrap. (Thu, 16 Jun 2011 08:12:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Neil Williams <codehelp@debian.org>:
Extra info received and forwarded to list. (Thu, 16 Jun 2011 08:12:04 GMT) Full text and rfc822 format available.

Message #37 received at 627179@bugs.debian.org (full text, mbox):

From: Neil Williams <codehelp@debian.org>
To: David Kuehling <dvdkhlng@gmx.de>
Cc: 627179@bugs.debian.org
Subject: Re: Bug#627179: multistrap: Using retainsources=dir does not retain some sources
Date: Thu, 16 Jun 2011 09:07:58 +0100
[Message part 1 (text/plain, inline)]
On Wed, 15 Jun 2011 23:22:25 +0200
David Kuehling <dvdkhlng@gmx.de> wrote:

> >> That's exactly the problem: inconsistent versions in the archive or
> >> archive updates while multistrap runs.
> 
> > That sounds like a broken archive.
> 
> > I haven't implemented the versioned source call - I remain unconvinced
> > that a valid archive would cause the download of a source package of a
> > different version to the binary package.
> 
> Are you assuming a non-changing archive? 

No, I'm assuming a decent mirroring tool. There is also a need to
understand exactly what apt is doing with apt-get source. I don't think
you've got that clear.

> As soon as the archive changes
> non-atomically (with locking applied by the client) I think we're
> doomed. 

Broken mirror resulting from an inadequate mirroring tool. Not
something which either apt or multistrap can fix or avoid.

> As we're building images from debian sid, changes will be
> pretty common.

Changes in sid may have the source and Arch:all packages arrive before
the buildds have built the binary for that arch but that is not why the
source version string appears in the Packages file (otherwise versions
wouldn't appear in stable and they clearly do.) i.e. using the version
will NOT help you in this situation.

If that's a problem, don't use unstable. (unstable is not meant to be
used for images, that's why we have testing. It's only usually 10 days
behind after all - AND you are assured of the matching version of the
source and binaries for all packages.) Unstable does not promise that
every architecture will always be in sync with the latest source but it
does ensure that the source is retained for as long as any one
supported arch hasn't finished installing the newer version.

> I'm not sure how many times 'multistrap' performs 'apt-get update'.

Every time the sources.list files change during the run. Once for the
bootstrap (which is all we care about here) and once for the
aptsources for the runtime system. Subsequent operations all use the
cache which apt creates from those downloaded files.

> Even if it only does it once, the source and binary package indices are
> distinct files, and they are retrieved by distinct transaction from the
> ftp/http server, so I see no way that you can guarantee consistency
> during mirror pushes.

Mirror pushes update both files simultaneously. The packages are copied
over to a incoming location, then the database is locked, the files are
copied into the pool, the indices are updated and then the original
files are deleted. 

It is the mirror push which ensures that the Packages and Sources file
are synchronised.

> The only atomicity you have is for updating a single index file via
> 'mv'.

And preparing the indices separately then 'mv' each into place is not
going to cause any detectable failure. I think you're chasing your
tail or not using a decent mirroring tool. 

>  * mirror gets updated, probably first new files put into the pool, then
>    the new indices follow, then even later, it's going to delete the
>    files no longer referenced by the indices.

No. The mirror gets updated by putting stuff safely into a local
temporary space, then the database is locked, then updated, the new
files are copied alongside the existing ones, then the prepared indices
are 'mv''d to replace the previous ones, then the old files are removed
and then the database is unlocked. It's as close to atomic as makes no
odds.

>  * Concurrently I run 'multistrap' which runs 'apt-get update', fetching
>    dists/sid/main/binary-amd64/Packages.bz2 and
>    sid/main/source/Sources.bz2 .
>  
>  * Now I get Packages.bz2 from before the mirror update, and Sources.bz2
>    from after the mirror update. 

No you don't. You get Packages.bz2 and Sources.bz2 in sync at the same
time in the same apt-get update call. Indeed in most cases, as apt is
using parallel connections, Packages and Sources are downloaded at the
same time over multiple sockets. What happens afterwards is that apt-get
source uses that cached data to get the sources. 

If a new version has arrived in the meantime then the old source
(the same version as the version of the binary downloaded earlier) will
still be obtained because apt only has cached data for the downloaded
source version, the Sources file already downloaded before the new
version arrived. In most cases, the old version will remain for at
least 10 days because it's the version currently in testing. In other
cases, the version is retained until built by all architectures. Either
way, there is nothing you can do in the apt-get source call to change
that because apt-get source only uses the Sources file which it
downloaded the last time apt-get update was run. If the mirror has
removed that file since then, there is nothing apt can do about it
except ask for apt-get update to be run again.

apt-get source does NOT go to the mirror and lookup the latest source
version on the mirror. It goes to it's cache of the Sources file which
it previously downloaded in parallel with the Packages file and creates
a http:// address for the .dsc and expects to be able to get that URL
(in the same was as wget would). 

If that file has been removed since the cache was updated, there is
nothing apt (or multistrap) can do about that. However, that is only
likely to happen with packages where a new version is uploaded to
unstable without allowing the 10 days for a testing migration.
Generally, such uploads are made because the package *won't* migrate
because of an RC bug which makes the argument that you shouldn't be
expecting to create an image using packages which could be susceptible
to such bugs in the first place.

> Sources.bz2 is going to have some
>    packages updated to newer versions, and won't correspond 100% to
>    binaries from Packages.bz2, thus violating the GPL

Not true - unless you're talking about waiting for packages to arrive
from the buildd's, but that is a very small space of time normally and
if that bothers you, just don't use unstable for the kind of tasks
where you need synchronised sources. Unstable does NOT make that
promise and there is nothing apt or multistrap can do about it.

> Of course I guess the error rate will not be too high, at least not
> over a normal high-rate internet connection.  

It's nothing to do with the download speeds.

-- 


Neil Williams
=============
http://www.linux.codehelp.co.uk/

[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Neil Williams <codehelp@debian.org>:
Bug#627179; Package multistrap. (Thu, 16 Jun 2011 09:24:28 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kuehling <dvdkhlng@gmx.de>:
Extra info received and forwarded to list. Copy sent to Neil Williams <codehelp@debian.org>. (Thu, 16 Jun 2011 09:24:29 GMT) Full text and rfc822 format available.

Message #42 received at 627179@bugs.debian.org (full text, mbox):

From: David Kuehling <dvdkhlng@gmx.de>
To: Neil Williams <codehelp@debian.org>
Cc: 627179@bugs.debian.org
Subject: Re: Bug#627179: multistrap: Using retainsources=dir does not retain some sources
Date: Thu, 16 Jun 2011 11:22:21 +0200
[Message part 1 (text/plain, inline)]
>>>>> "Neil" == Neil Williams <codehelp@debian.org> writes:

Thought I'd just point out where my paranoia about proper source version
matching comes from:

 * we're using architecture powerpcspe from debian-ports.  debian-ports
   doesn't carry sources, and is out of sync with normal debian mirrors,
   which makes it pretty difficult to satisfy the GPL.

 * we're using our own, custom mirroring tool [1] to overcome the
   limitations of debian-ports.  this tool is named 'debparanoia' for a
   reason, as it double checks that matching source packages are present
   for all .debs.

 * Since debparanoia is used for mirroring as well as for
   "license-checking" our images, I dont't really care if multistrap
   does the source package version check.  I just thought it would be
   good anyways to eliminate the slightest chance of 'apt-get source'
   not satisfying the GPL by getting wrong package versions, without the
   user noticing.

Below I'll try again to prove my point, sorry if this is getting off
topic and wasting your time :)

> On Wed, 15 Jun 2011 23:22:25 +0200
> David Kuehling <dvdkhlng@gmx.de> wrote:
>> Are you assuming a non-changing archive?

> No, I'm assuming a decent mirroring tool. There is also a need to
> understand exactly what apt is doing with apt-get source. I don't
> think you've got that clear.

I'm sorry if I gave the impression of not understanding apt nor archives
:) Maybe last mail just written to hastily.  I think I understand that
stuff pretty well.

>> As soon as the archive changes non-atomically (with locking applied
>> by the client) I think we're doomed.

> Broken mirror resulting from an inadequate mirroring tool. Not
> something which either apt or multistrap can fix or avoid.

I'm willing to believe that debian mirror updates are atomic.

[..]

>> * Concurrently I run 'multistrap' which runs 'apt-get update',
>> fetching dists/sid/main/binary-amd64/Packages.bz2 and
>> sid/main/source/Sources.bz2 .
>> 
>> * Now I get Packages.bz2 from before the mirror update, and
>> Sources.bz2 from after the mirror update.

> No you don't. You get Packages.bz2 and Sources.bz2 in sync at the same
> time in the same apt-get update call. Indeed in most cases, as apt is
> using parallel connections, Packages and Sources are downloaded at the
> same time over multiple sockets. What happens afterwards is that
> apt-get source uses that cached data to get the sources.

Well this is the part that only works if you cross your fingers.
Nothing guarantees that 'apt-get update' schedules the packages.bz2 and
sources.bz2 for synchronous download.  In fact typing in 'apt-get
update' on my pc, it first downloads 4 Sources files, then 4 Packages
files for me.

This race can be detected by checking Packages.bz2 and Sources.bz2 with
the checksums present in the Release.  Not sure whether that's
implemented.  For me 'multistrap' happily uses repositories without
checksums in the Release file.

> If a new version has arrived in the meantime then the old source (the
> same version as the version of the binary downloaded earlier) will
> still be obtained because apt only has cached data for the downloaded
> source version, the Sources file already downloaded before the new
> version arrived. In most cases, the old version will remain for at
> least 10 days because it's the version currently in testing. 
[..]

I do not contest that debian archives carry source packages for all
binary packages in the pool.  I only contest that the .deb packages
referenced by apt's cache do not neccessarily match the source packages
referenced the cache.  This condition would result in license violation
for people who rely on 'apt-get source' to satisfy the GPL.

A version mismatch will occur exacty when a mirror update ocurred in
between the download of sources.bz2 and packages.bz2.  You want to tell
me that this is not possible, however your description of the process
makes it look like it is possible, though unlikely.

The only way to prevent such a race would be to either (a) prevent
mirror updates (i.e. 'lock' the archive) during 'apt-get update'
sessions, or (b) to guarantee that Sources.bz2 and Packages.bz2 download
starts at exactly the same point in time.

I think neither (a) nor (b) can be implemented.  You can only implement
(c): ensure consistency with checksums in the release file, and retry
download ad infinitum, until checksums match.

>> Sources.bz2 is going to have some packages updated to newer versions,
>> and won't correspond 100% to binaries from Packages.bz2, thus
>> violating the GPL

> Not true - unless you're talking about waiting for packages to arrive
> from the buildd's, but that is a very small space of time normally and
> if that bothers you, just don't use unstable for the kind of tasks
> where you need synchronised sources. Unstable does NOT make that
> promise and there is nothing apt or multistrap can do about it.

Note that I wasn't talking about packages in the pool, but only about
the indices that were "snapshotted" by 'apt-get update'.

>> Of course I guess the error rate will not be too high, at least not
>> over a normal high-rate internet connection.

> It's nothing to do with the download speeds.

Well, try 'apt-get update' over a modem line and notice how much time
passes between fetching sources.bz2 and fetching the packages.bz2.
Pretty long time for a mirror update occuring in between?!

cheers,

David

[1] http://sourceforge.net/projects/debparanoia/
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#627179; Package multistrap. (Thu, 16 Jun 2011 11:21:27 GMT) Full text and rfc822 format available.

Acknowledgement sent to Neil Williams <codehelp@debian.org>:
Extra info received and forwarded to list. (Thu, 16 Jun 2011 11:21:30 GMT) Full text and rfc822 format available.

Message #47 received at 627179@bugs.debian.org (full text, mbox):

From: Neil Williams <codehelp@debian.org>
To: David Kuehling <dvdkhlng@gmx.de>
Cc: 627179@bugs.debian.org
Subject: Re: Bug#627179: multistrap: Using retainsources=dir does not retain some sources
Date: Thu, 16 Jun 2011 12:18:10 +0100
[Message part 1 (text/plain, inline)]
On Thu, 16 Jun 2011 11:22:21 +0200
David Kuehling <dvdkhlng@gmx.de> wrote:

> >>>>> "Neil" == Neil Williams <codehelp@debian.org> writes:
> 
> Thought I'd just point out where my paranoia about proper source version
> matching comes from:
> 
>  * we're using architecture powerpcspe from debian-ports.  debian-ports
>    doesn't carry sources, and is out of sync with normal debian mirrors,
>    which makes it pretty difficult to satisfy the GPL.

pool-powerpcspe/main/e/eglibc/eglibc-source_2.11.1-2+powerpcspe1_all.deb

That's from:

http://ftp.debian-ports.org/debian/dists/unreleased/main/binary-powerpcspe/Packages

http://ftp.debian-ports.org/debian/pool-powerpcspe/main/e/eglibc/

The problem is this:

http://ftp.debian-ports.org/debian/dists/unreleased/main/source/

Sources is a zero length file.

There is nothing apt or multistrap can do to help you here.

The problem is that the source DOES exist, even for unreleased, but in
a non-standard pool location which is NOT listed in the Sources file.
This could be a bug in debian-ports. Hector will try and have a look at
that.

>  * we're using our own, custom mirroring tool [1] to overcome the
>    limitations of debian-ports.  this tool is named 'debparanoia' for a
>    reason, as it double checks that matching source packages are present
>    for all .debs.

Use reprepro which is what Emdebian uses to twist the debian-ports
stuff into some kind of standard shape, albeit that the problem of
unreleased sources cannot be fixed that way. reprepro updates are
atomic.

>  * Since debparanoia is used for mirroring as well as for
>    "license-checking" our images, I dont't really care if multistrap
>    does the source package version check.  I just thought it would be
>    good anyways to eliminate the slightest chance of 'apt-get source'
>    not satisfying the GPL by getting wrong package versions, without the
>    user noticing.

If the Sources file was not zero bytes, that could work.
 
> >> * Concurrently I run 'multistrap' which runs 'apt-get update',
> >> fetching dists/sid/main/binary-amd64/Packages.bz2 and
> >> sid/main/source/Sources.bz2 .
> >> 
> >> * Now I get Packages.bz2 from before the mirror update, and
> >> Sources.bz2 from after the mirror update.
> 
> > No you don't. You get Packages.bz2 and Sources.bz2 in sync at the same
> > time in the same apt-get update call. Indeed in most cases, as apt is
> > using parallel connections, Packages and Sources are downloaded at the
> > same time over multiple sockets. What happens afterwards is that
> > apt-get source uses that cached data to get the sources.
> 
> Well this is the part that only works if you cross your fingers.
> Nothing guarantees that 'apt-get update' schedules the packages.bz2 and
> sources.bz2 for synchronous download.  In fact typing in 'apt-get
> update' on my pc, it first downloads 4 Sources files, then 4 Packages
> files for me.

Check the http-method carefully, it opens sockets for each of these
files BEFORE indicating that the download has started and will commonly
show parallel downloads if your connection is slow enough or your
system far enough out of date from the last update.

> This race can be detected by checking Packages.bz2 and Sources.bz2 with
> the checksums present in the Release.  Not sure whether that's
> implemented.  For me 'multistrap' happily uses repositories without
> checksums in the Release file.

Bad mirror.
 
> I do not contest that debian archives carry source packages for all
> binary packages in the pool.  I only contest that the .deb packages
> referenced by apt's cache do not neccessarily match the source packages
> referenced the cache.  This condition would result in license violation
> for people who rely on 'apt-get source' to satisfy the GPL.

That doesn't follow.
 
> A version mismatch will occur exacty when a mirror update ocurred in
> between the download of sources.bz2 and packages.bz2.  You want to tell
> me that this is not possible, however your description of the process
> makes it look like it is possible, though unlikely.

How? apt opens the sockets and then starts the download. If the file
changes, the download will abort. Both sockets are open before the
download starts. Therefore, if the files download successfully, the
files must be in the same state as when the sockets were originally
opened. Are you trying to say that files change in the microsecond
between the creation of one socket and the creation of the next socket
on a multi-core server??

This isn't a race in apt, it's an inherent problem of using unofficial
ports with unreleased patches and a broken Source listing.

If you can prove such a race, report it as a bug in apt. It has nothing
to do with multistrap.
 
> The only way to prevent such a race would be to either (a) prevent
> mirror updates (i.e. 'lock' the archive) during 'apt-get update'
> sessions, or (b) to guarantee that Sources.bz2 and Packages.bz2 download
> starts at exactly the same point in time.

Which, apart from the time which elapses between the opening of one
socket and the opening of the next is already implemented.

> I think neither (a) nor (b) can be implemented.  You can only implement
> (c): ensure consistency with checksums in the release file, and retry
> download ad infinitum, until checksums match.

False.
 
> Note that I wasn't talking about packages in the pool, but only about
> the indices that were "snapshotted" by 'apt-get update'.

... which are downloaded using parallel sockets from the mirror which
uses an atomic process to create them...
 
> >> Of course I guess the error rate will not be too high, at least not
> >> over a normal high-rate internet connection.
> 
> > It's nothing to do with the download speeds.
> 
> Well, try 'apt-get update' over a modem line and notice how much time
> passes between fetching sources.bz2 and fetching the packages.bz2.
> Pretty long time for a mirror update occuring in between?!

You are being misled by the output messages of apt which only start
listing stuff when *data is sent over the open socket*, not when the
socket itself is opened.

I used apt over 28k modems for a long, long time.

-- 


Neil Williams
=============
http://www.linux.codehelp.co.uk/

[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Neil Williams <codehelp@debian.org>:
Bug#627179; Package multistrap. (Fri, 17 Jun 2011 16:57:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kuehling <dvdkhlng@gmx.de>:
Extra info received and forwarded to list. Copy sent to Neil Williams <codehelp@debian.org>. (Fri, 17 Jun 2011 16:57:03 GMT) Full text and rfc822 format available.

Message #52 received at 627179@bugs.debian.org (full text, mbox):

From: David Kuehling <dvdkhlng@gmx.de>
To: Neil Williams <codehelp@debian.org>
Cc: 627179@bugs.debian.org
Subject: Re: Bug#627179: multistrap: Using retainsources=dir does not retain some sources
Date: Fri, 17 Jun 2011 18:54:31 +0200
[Message part 1 (text/plain, inline)]
>>>>> "Neil" == Neil Williams <codehelp@debian.org> writes:

> Could you have a look at the current SVN revision and let me know how
> that matches your tests?

I just tested with r8024 and it now seems to correctly retrieve all
source packages (i.e. the files in the retainsources directory now pass
the debparanoia license check).

cheers,

David
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40
[Message part 2 (application/pgp-signature, inline)]

Reply sent to Neil Williams <codehelp@debian.org>:
You have taken responsibility. (Sun, 19 Jun 2011 11:07:24 GMT) Full text and rfc822 format available.

Notification sent to David Kuehling <dvdkhlng@gmx.de>:
Bug acknowledged by developer. (Sun, 19 Jun 2011 11:07:28 GMT) Full text and rfc822 format available.

Message #57 received at 627179-close@bugs.debian.org (full text, mbox):

From: Neil Williams <codehelp@debian.org>
To: 627179-close@bugs.debian.org
Subject: Bug#627179: fixed in multistrap 2.1.15
Date: Sun, 19 Jun 2011 11:03:12 +0000
Source: multistrap
Source-Version: 2.1.15

We believe that the bug you reported is fixed in the latest version of
multistrap, which is due to be installed in the Debian FTP archive:

multistrap_2.1.15.dsc
  to main/m/multistrap/multistrap_2.1.15.dsc
multistrap_2.1.15.tar.gz
  to main/m/multistrap/multistrap_2.1.15.tar.gz
multistrap_2.1.15_all.deb
  to main/m/multistrap/multistrap_2.1.15_all.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 627179@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Neil Williams <codehelp@debian.org> (supplier of updated multistrap package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.8
Date: Sat, 18 Jun 2011 16:17:13 +0100
Source: multistrap
Binary: multistrap
Architecture: source all
Version: 2.1.15
Distribution: unstable
Urgency: low
Maintainer: Neil Williams <codehelp@debian.org>
Changed-By: Neil Williams <codehelp@debian.org>
Description: 
 multistrap - multiple repository bootstrap based on apt
Closes: 627179 630314
Changes: 
 multistrap (2.1.15) unstable; urgency=low
 .
   * Clean up the retainsources behaviour (Closes: #627179)
   * Implement some code for omitpreinst support.
   * typo fix in manpage (Closes: #630314)
Checksums-Sha1: 
 77354dc312222989a3ec03a541b9c19b7f723265 1014 multistrap_2.1.15.dsc
 411d18ea5375651c1ea207ce803cd3ad31f21929 181730 multistrap_2.1.15.tar.gz
 1d9928c6deb4612442360082be6b326429b3f23c 97628 multistrap_2.1.15_all.deb
Checksums-Sha256: 
 2f5e30bc293b0a8e73f944f85cdaf7ce5cea19a2f3616cea5ab3e4394dd3da3a 1014 multistrap_2.1.15.dsc
 262926aac6220daee3550d575f8ef7e7f72f0f4801ab97e170c976b834cb08a5 181730 multistrap_2.1.15.tar.gz
 70ba3304a88bd996b1cbf7ce38a975c34b619a1a4ed0ad1abd4c5caf049fe752 97628 multistrap_2.1.15_all.deb
Files: 
 f01ccc3bdb7efc225ce04941a7ce345d 1014 utils optional multistrap_2.1.15.dsc
 e65f4a3b272664345655911d7d022129 181730 utils optional multistrap_2.1.15.tar.gz
 3544b6d35b28b8305e2ec82ee1f14e7b 97628 admin optional multistrap_2.1.15_all.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk390yUACgkQiAEJSii8s+NCrgCgtUi9bglmI1TAzhCh3BM+J8WT
0UkAn1Ekq1OoWk9ZCcCAUkvWSqL7UyGO
=IJNX
-----END PGP SIGNATURE-----





Information forwarded to debian-bugs-dist@lists.debian.org, Neil Williams <codehelp@debian.org>:
Bug#627179; Package multistrap. (Mon, 20 Jun 2011 10:12:13 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kuehling <dvdkhlng@gmx.de>:
Extra info received and forwarded to list. Copy sent to Neil Williams <codehelp@debian.org>. (Mon, 20 Jun 2011 10:12:16 GMT) Full text and rfc822 format available.

Message #62 received at 627179@bugs.debian.org (full text, mbox):

From: David Kuehling <dvdkhlng@gmx.de>
To: Neil Williams <codehelp@debian.org>
Cc: 627179@bugs.debian.org
Subject: Re: Bug#627179: multistrap: Using retainsources=dir does not retain some sources
Date: Mon, 20 Jun 2011 12:09:05 +0200
[Message part 1 (text/plain, inline)]
>>>>> "Neil" == Neil Williams <codehelp@debian.org> writes:

>> A version mismatch will occur exacty when a mirror update ocurred in
>> between the download of sources.bz2 and packages.bz2.  You want to
>> tell me that this is not possible, however your description of the
>> process makes it look like it is possible, though unlikely.

> How? apt opens the sockets and then starts the download. If the file
> changes, the download will abort. Both sockets are open before the
> download starts. Therefore, if the files download successfully, the
> files must be in the same state as when the sockets were originally
> opened. Are you trying to say that files change in the microsecond
> between the creation of one socket and the creation of the next socket
> on a multi-core server??

[..]

> Which, apart from the time which elapses between the opening of one
> socket and the opening of the next is already implemented.

Maybe if *I* ran a multi-core server and sit on the same LAN as debian's
mirror, opening of two sockets would be nearly synchronous and happen in
a "microsecond".

However, I'm pretty far away from the server, the 3-way handshake to
open a socket can vary a lot in its timing depending on latencies and
packet error rate.

With a realistic socket setup jitter of 100ms and one mirror update per
day, you'll fetch the wrong index file once in about 10^6 downloads.

Relying on uncontrollable network parameters for avoiding races sounds
like a bad idea to me.  Betting on non-deterministic software to
function, because malfunctioning looks unlikely, is not the best idea as
well.

cheers,

David
-- 
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205  D016 7DEF 5323 C174 7D40
[Message part 2 (application/pgp-signature, inline)]

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 28 Jul 2011 07:37:28 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Apr 24 23:00:40 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.