Debian Bug report logs - #372712
apt: Shouldn't download pdiffs in all cases

version graph

Package: apt; Maintainer for apt is APT Development Team <deity@lists.debian.org>; Source for apt is src:apt.

Reported by: Mike Hommey <mh+reportbug@glandium.org>

Date: Sun, 11 Jun 2006 09:33:20 UTC

Severity: wishlist

Found in versions apt/0.6.44, apt/0.6.44.2

Fixed in version 0.7.25.1

Done: Daniel Hartwig <mandyke@gmail.com>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Mike Hommey <mh+reportbug@glandium.org>:
New Bug report received and forwarded. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Mike Hommey <mh+reportbug@glandium.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: apt: Shouldn't download pdiffs in all cases
Date: Sun, 11 Jun 2006 11:20:47 +0200
Package: apt
Version: 0.6.44
Severity: wishlist


When you don't apt-get update for a while, downloading all the pdiffs
instead of the full file is actually much longer. There should be a limit
of days without update after which apt would get the full file and not the
pdiffs.

PS: Is there a way to totally disable the pdiff stuff ? With decent
bandwidth, it actually takes more times than downloading the full file...
(Or is the goal to reduce the bandwitdh on the server side ?)

-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.16-1-686
Locale: LANG=ja_JP.UTF-8, LC_CTYPE=ja_JP.UTF-8 (charmap=UTF-8)

Versions of packages apt depends on:
ii  libc6                         2.3.6-7    GNU C Library: Shared libraries
ii  libgcc1                       1:4.1.0-3  GCC support library
ii  libstdc++6                    4.1.0-3    The GNU Standard C++ Library v3

Versions of packages apt recommends:
ii  debian-archive-keyring        2006.01.18 GnuPG archive keys of the Debian a

-- no debconf information



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #10 received at submit@bugs.debian.org (full text, mbox):

From: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
To: Mike Hommey <mh+reportbug@glandium.org>
Cc: 372712@bugs.debian.org, Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Re: Bug#372712: apt: Shouldn't download pdiffs in all cases
Date: Sun, 11 Jun 2006 19:29:10 +0200
Mike Hommey <mh+reportbug@glandium.org> writes:

> Package: apt
> Version: 0.6.44
> Severity: wishlist
>
>
> When you don't apt-get update for a while, downloading all the pdiffs
> instead of the full file is actually much longer. There should be a limit
> of days without update after which apt would get the full file and not the
> pdiffs.

There is a limit on the number of pdiff files available. With one file
per day that translates 1:1 into days.

I don't get why it should be slower to download the diff files than
the full file unless you are using ftp. With http the index is fetched
(round trip 1) and then all needed diff files (round trip 2). So
downloading should never be slower.

Does the patching take longer than downloading and bunziping the full
file on your system? I would think that even 10 days patching are
faster than bunziping the file but that is just a guess.

> PS: Is there a way to totally disable the pdiff stuff ? With decent
> bandwidth, it actually takes more times than downloading the full file...
> (Or is the goal to reduce the bandwitdh on the server side ?)

The primary goal was to save all that download time every day on the
slow modem/dsl line. Probably nobody was thinking about GBit to the
next mirror and 2 weeks worth of diffs.

Maybe the number of diffs kept is to big. Maybe not. To decide that
one needs more info about your (and lots of other peoples) network
structure and update seeds/traffic and then find a good compromise.

MfG
        Goswin



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Mike Hommey <mh@glandium.org>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #20 received at 372712@bugs.debian.org (full text, mbox):

From: Mike Hommey <mh@glandium.org>
To: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
Cc: 372712@bugs.debian.org
Subject: Re: Bug#372712: apt: Shouldn't download pdiffs in all cases
Date: Sun, 11 Jun 2006 21:59:34 +0200
On Sun, Jun 11, 2006 at 07:29:10PM +0200, Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de> wrote:
> Mike Hommey <mh+reportbug@glandium.org> writes:
> 
> > Package: apt
> > Version: 0.6.44
> > Severity: wishlist
> >
> >
> > When you don't apt-get update for a while, downloading all the pdiffs
> > instead of the full file is actually much longer. There should be a limit
> > of days without update after which apt would get the full file and not the
> > pdiffs.
> 
> There is a limit on the number of pdiff files available. With one file
> per day that translates 1:1 into days.
> 
> I don't get why it should be slower to download the diff files than
> the full file unless you are using ftp. With http the index is fetched
> (round trip 1) and then all needed diff files (round trip 2). So
> downloading should never be slower.
> 
> Does the patching take longer than downloading and bunziping the full
> file on your system? I would think that even 10 days patching are
> faster than bunziping the file but that is just a guess.

The computer on which I got this shock had not been updated for a
month, and yet, had to download all the diffs for more than 30 days !

It seems the operation is hard to quantify in time since downloading and
patching seems to be done sequencially (download, patch, download,
patch, etc.).

I don't know either what the speed indicator in apt gives then, but it's
only giving 12KB/s where I can download full files at 700KB/s...

Note that I use http, not ftp.

> > PS: Is there a way to totally disable the pdiff stuff ? With decent
> > bandwidth, it actually takes more times than downloading the full file...
> > (Or is the goal to reduce the bandwitdh on the server side ?)
> 
> The primary goal was to save all that download time every day on the
> slow modem/dsl line. Probably nobody was thinking about GBit to the
> next mirror and 2 weeks worth of diffs.
> 
> Maybe the number of diffs kept is to big. Maybe not. To decide that
> one needs more info about your (and lots of other peoples) network
> structure and update seeds/traffic and then find a good compromise.

I usually apt-get update every other day, and roughly download at
700KB/s.

If you need more testing and numbers, tell me what you would like me to
provide you, and I'll try to give you facts rather than impressions :)

Mike



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #25 received at 372712@bugs.debian.org (full text, mbox):

From: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
To: Mike Hommey <mh@glandium.org>
Cc: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>, 372712@bugs.debian.org
Subject: Re: Bug#372712: apt: Shouldn't download pdiffs in all cases
Date: Sun, 11 Jun 2006 23:12:06 +0200
Mike Hommey <mh@glandium.org> writes:

> On Sun, Jun 11, 2006 at 07:29:10PM +0200, Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de> wrote:
>> Mike Hommey <mh+reportbug@glandium.org> writes:
>> 
>> > Package: apt
>> > Version: 0.6.44
>> > Severity: wishlist
>> >
>> >
>> > When you don't apt-get update for a while, downloading all the pdiffs
>> > instead of the full file is actually much longer. There should be a limit
>> > of days without update after which apt would get the full file and not the
>> > pdiffs.
>> 
>> There is a limit on the number of pdiff files available. With one file
>> per day that translates 1:1 into days.
>> 
>> I don't get why it should be slower to download the diff files than
>> the full file unless you are using ftp. With http the index is fetched
>> (round trip 1) and then all needed diff files (round trip 2). So
>> downloading should never be slower.
>> 
>> Does the patching take longer than downloading and bunziping the full
>> file on your system? I would think that even 10 days patching are
>> faster than bunziping the file but that is just a guess.
>
> The computer on which I got this shock had not been updated for a
> month, and yet, had to download all the diffs for more than 30 days !
>
> It seems the operation is hard to quantify in time since downloading and
> patching seems to be done sequencially (download, patch, download,
> patch, etc.).
>
> I don't know either what the speed indicator in apt gives then, but it's
> only giving 12KB/s where I can download full files at 700KB/s...
>
> Note that I use http, not ftp.

Maybe I'm wrong but I asked some apt maintainer before how the
downloading works. It should fetch the index, find all the patches
needed from the old Packages file to the current one and then fire off
all the donwlods together using the http keep-alive option. There
shouldn't be a delay between downloads. The patching should work in
parallel with downloads.

Anyway, that is the theory.

>> > PS: Is there a way to totally disable the pdiff stuff ? With decent
>> > bandwidth, it actually takes more times than downloading the full file...
>> > (Or is the goal to reduce the bandwitdh on the server side ?)
>> 
>> The primary goal was to save all that download time every day on the
>> slow modem/dsl line. Probably nobody was thinking about GBit to the
>> next mirror and 2 weeks worth of diffs.
>> 
>> Maybe the number of diffs kept is to big. Maybe not. To decide that
>> one needs more info about your (and lots of other peoples) network
>> structure and update seeds/traffic and then find a good compromise.
>
> I usually apt-get update every other day, and roughly download at
> 700KB/s.

But now you get 12K/s for a 20K file instead of 700K/s for a 4MB
file. Anyway, the speed indicator isn't good with small files.

> If you need more testing and numbers, tell me what you would like me to
> provide you, and I'll try to give you facts rather than impressions :)
>
> Mike

I will refrain from updating for a few days and test this too. I
normaly have a cron job that runs update + auto-clean every night. So
I never have to wait for it.

MfG
        Goswin



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Michael Vogt <mvo@debian.org>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #30 received at submit@bugs.debian.org (full text, mbox):

From: Michael Vogt <mvo@debian.org>
To: Mike Hommey <mh+reportbug@glandium.org>, 372712@bugs.debian.org
Cc: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: Re: Bug#372712: apt: Shouldn't download pdiffs in all cases
Date: Mon, 12 Jun 2006 17:10:45 +0200
On Sun, Jun 11, 2006 at 11:20:47AM +0200, Mike Hommey wrote:
> Package: apt
> Version: 0.6.44
> Severity: wishlist

Thanks for your bugreport. 
 
> When you don't apt-get update for a while, downloading all the pdiffs
> instead of the full file is actually much longer. There should be a limit
> of days without update after which apt would get the full file and not the
> pdiffs.
[..]

Currently the implementation will not limit the number of fetched
pdiff files. This is certainly a interessting option. 

> PS: Is there a way to totally disable the pdiff stuff ? With decent
> bandwidth, it actually takes more times than downloading the full file...
> (Or is the goal to reduce the bandwitdh on the server side ?)

It is possible to disable the pdiff files with:
apt-get update -o Acquire::PDiffs=false

Cheers,
 Michael

-- 
Linux is not The Answer. Yes is the answer. Linux is The Question. - Neo



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Michael Vogt <mvo@debian.org>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Julian Mehnle <julian@mehnle.net>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #40 received at 372712@bugs.debian.org (full text, mbox):

From: Julian Mehnle <julian@mehnle.net>
To: 372712@bugs.debian.org
Cc: Mike Hommey <mh@glandium.org>
Subject: Re: Bug#372712: apt: Shouldn't download pdiffs in all cases
Date: Thu, 15 Jun 2006 22:28:30 +0000
[Message part 1 (text/plain, inline)]
Goswin von Brederlow wrote:
> Maybe I'm wrong but I asked some apt maintainer before how the
> downloading works. It should fetch the index, find all the patches
> needed from the old Packages file to the current one and then fire off
> all the donwlods together using the http keep-alive option. There
> shouldn't be a delay between downloads. The patching should work in
> parallel with downloads.
>
> Anyway, that is the theory.

That theory sounds great, but it must have been implemented rather badly.

I can confirm that on my system downloading (and processing) even just 
three days' pdiffs takes about 200% as long as downloading the full files 
took before.  This is on a P3 600MHz 256MB on a 2Mbit/s DSL connection.

I have disabled Apt's pdiff "feature" for now.
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Michelle Konzack <linux4michelle@freenet.de>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #45 received at 372712@bugs.debian.org (full text, mbox):

From: Michelle Konzack <linux4michelle@freenet.de>
To: Mike Hommey <mh+reportbug@glandium.org>, 372712@bugs.debian.org
Subject: Re: Bug#372712: apt: Shouldn't download pdiffs in all cases
Date: Mon, 12 Jun 2006 00:01:58 +0200
Am 2006-06-11 11:20:47, schrieb Mike Hommey:

> PS: Is there a way to totally disable the pdiff stuff ? With decent
> bandwidth, it actually takes more times than downloading the full file...
> (Or is the goal to reduce the bandwitdh on the server side ?)

AFAIK on the Server side and for peoples with Low-Bandwidth.
Please note, that less then 20% of Internet Users have Cabel or DSL.

Greetings
    Michelle Konzack


-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack   Apt. 917                  ICQ #328449886
                   50, rue de Soultz         MSM LinuxMichi
0033/6/61925193    67100 Strasbourg/France   IRC #Debian (irc.icq.com)




Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to <mjr@phonecoop.coop>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #50 received at 372712@bugs.debian.org (full text, mbox):

From: <mjr@phonecoop.coop>
To: <372712@bugs.debian.org>
Subject: apt: Shouldn't download pdiffs by default yet
Date: Tue, 04 Jul 2006 14:50:34 +0100
I have independently hit this bug.  My download speed slowed from its
usual 100+kB/s average to around 3kB/s - worse than when I used to
use dialup.  It also appeared to be repeatedly downloading the same
pdiffs, but I understand from bug 372504 that it's a cosmetic bug.
Finally, after all the pdiff downloading, it aborted with "internal
rred error".

The next update seemed to download the package files normally,
so I could not reproduce/test the problem.

Please disable pdiff downloading by default until it supports
an upper limit on the number of pdiffs and the display problems
are corrected.

Thank you,
-- 
MJR/slef




Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to A Mennucc <debdev@mennucci.sns.it>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #55 received at 372712@bugs.debian.org (full text, mbox):

From: A Mennucc <debdev@mennucci.sns.it>
To: 372712@bugs.debian.org
Subject: as in debdelta
Date: Wed, 19 Jul 2006 17:37:30 +0200
[Message part 1 (text/plain, inline)]
hi

I have the same problem with debdelta, so I want to share some
ideas

introduction: the command "debdelta-upgrade" downloads deltas , and
then applies them to create all .deb necessary for a 
'apt-get upgrade' ; so it is doing more or less the same

my advices are

1) "debdelta-upgrade"  has two threads; one thread 
downloads the patches and queues them for a second thread, that
applies them; you may do the same with pdiffs
(actually, I see some people saying that patchin is
done in parallel; in case, ignore the above)

2) anyway , it may happen that, in some unfortunate cases, the time of
downloading/patching is so high that it would be more efficient to
just download the new .deb

my solution (that I am implementing in "debdelta-upgrade" ) is
to keep a statistic of the downloading speed (serverwise)
and patching speed, and then do the math and decide

you may do the same ; each time apt-get update is invoked, look at how
many patches are needed, and do a simple computation to decide if it
is faster to download all patches or to download the new Packages.bz2

3) the current structure of pdiffs AFAIK is not optimal ;
 there is a better "binary" structure of diffs that works better, that is:

time complexity for end user:
 if N (< 128) is the number of changes that occured to Packages file
 since the last update, the user needs to download and apply
  ~ 2 (1 + log_2 (N))   diffs

disk complexity on server:
 it will use more space on servers; in this proposed version,
 the servers need to store ~ twice the diffs

here is the scheme  (in python-ish/shell-ish pseudocode):

 this is run on the server, after Packages is changed :
 let v be an increasing number, serializing the 
 successive versions of the packages files in the server
 
 def add_packages_file_to_diff_scheme(Packages):
   v = v + 1
   for x in 0 , 1 , 3 , 7 , 15 ,  31  :
    if ( x == 0 ) or  ( x & v == 0 ) :
     diff /backups/Packages.mask.$x  Packages > pdiff.$v.mask.$x
     copy Packages /backups/Packages.mask.$x

 this is run on the client:
 let w be the version number of the Packages that the user has

 def update_packages():
    while w < v :
      for x in 31 , 15 , 7 , 3 , 1 , 0 :
        if ( ( x == 0 )  or ( x & w == 0 ) ) and ( w + x + 1 <= v ) :
          download pdiff.$w.mask.$x
          patch pdiff.$w.mask.$x  Packages
          w = w + x + 1
          break

example: suppose you have version 3 , and the server has version 35 ;
in the current scheme, you would need 32 patches; in my scheme, you need
 pdiff.4.mask.0
 pdiff.8.mask.3
 pdiff.16.mask.7
 pdiff.32.mask.15
 pdiff.34.mask.1
 pdiff.35.mask.0

a.

-- 
Andrea Mennucc
 "E' un mondo difficile. Che vita intensa!" (Tonino Carotone)
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #60 received at 372712@bugs.debian.org (full text, mbox):

From: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
To: A Mennucc <debdev@mennucci.sns.it>
Cc: 372712@bugs.debian.org
Subject: Re: Bug#372712: as in debdelta
Date: Thu, 20 Jul 2006 12:59:15 +0200
A Mennucc <debdev@mennucci.sns.it> writes:

> hi
>
> I have the same problem with debdelta, so I want to share some
> ideas
>
> introduction: the command "debdelta-upgrade" downloads deltas , and
> then applies them to create all .deb necessary for a 
> 'apt-get upgrade' ; so it is doing more or less the same
>
> my advices are
>
> 1) "debdelta-upgrade"  has two threads; one thread 
> downloads the patches and queues them for a second thread, that
> applies them; you may do the same with pdiffs
> (actually, I see some people saying that patchin is
> done in parallel; in case, ignore the above)

pdiff files are currently fetched in a very bad way.

1. fetch index
2. fetch first diff
3. gunzip
4. rred diff and start downloading the next diff (goto 2)

This means that diffs downloads are not properly streamlined and take
up valuable time. rred is also awfully slow and run for every diff.

> 2) anyway , it may happen that, in some unfortunate cases, the time of
> downloading/patching is so high that it would be more efficient to
> just download the new .deb
>
> my solution (that I am implementing in "debdelta-upgrade" ) is
> to keep a statistic of the downloading speed (serverwise)
> and patching speed, and then do the math and decide
>
> you may do the same ; each time apt-get update is invoked, look at how
> many patches are needed, and do a simple computation to decide if it
> is faster to download all patches or to download the new Packages.bz2
>
> 3) the current structure of pdiffs AFAIK is not optimal ;
>  there is a better "binary" structure of diffs that works better, that is:
>
> time complexity for end user:
>  if N (< 128) is the number of changes that occured to Packages file
>  since the last update, the user needs to download and apply
>   ~ 2 (1 + log_2 (N))   diffs
>
> disk complexity on server:
>  it will use more space on servers; in this proposed version,
>  the servers need to store ~ twice the diffs

Think about this scheme:

You start off with Packages.updates being the Packages file. New
version infos of packages are prepended to the Packages.updates file
and the old version info is not (completly) removed. Fields that
remain the same can be omitted ion the new info and fields that
disapeared are included with empty contents. Fields in the old info
that are replaced by the new info can be omitted except for the
MD5sum. Old info that has become too short (e.g. only 3-4 fileds left)
can be merged into one of the newer infos. An entry of "Package: -foo"
signals the removal of the package completly.

Updating the Packages file then means fetching the new
Packages.updates file up to the first MD5sum entry that is already
known locally. At that point you have all the changes to update the
Packages file (or the whole file). If downloading, unpacking and
parsing is pipelined you can stop the downloading and unpacking at
that point. There is some wasted download especially with bzip2s
chunkyness but overall that should be minimal.

Note that this needs only one update file for the Packages file for
any number of updates and the filesize can be a fixed amount of
overhead compared to the plain file. The Packages file could also be
replaced completly by this format over time leaving only a
configurable fixed amount of extra space.

MfG
        Goswin



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to mennucc1@debian.org:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #65 received at 372712@bugs.debian.org (full text, mbox):

From: debdev@tonelli.sns.it (A Mennucc)
To: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
Cc: 372712@bugs.debian.org
Subject: Re: Bug#372712: as in debdelta
Date: Thu, 20 Jul 2006 14:30:38 +0200
On Thu, Jul 20, 2006 at 12:59:15PM +0200, Goswin von Brederlow wrote:
>  There is some wasted download especially with bzip2s
> chunkyness but overall that should be minimal.

Unfortunately not; 'bzip2 -9' processes stuff in data blocks of 900kB
(uncompressed); since the ratio is ~ 4.4 , you would always need to
download ~ 200kB of data to get the first block

a.

-- 
Andrea Mennucc



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #70 received at 372712@bugs.debian.org (full text, mbox):

From: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
To: mennucc1@debian.org
Cc: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>, 372712@bugs.debian.org
Subject: Re: Bug#372712: as in debdelta
Date: Thu, 20 Jul 2006 16:58:42 +0200
debdev@tonelli.sns.it (A Mennucc) writes:

> On Thu, Jul 20, 2006 at 12:59:15PM +0200, Goswin von Brederlow wrote:
>>  There is some wasted download especially with bzip2s
>> chunkyness but overall that should be minimal.
>
> Unfortunately not; 'bzip2 -9' processes stuff in data blocks of 900kB
> (uncompressed); since the ratio is ~ 4.4 , you would always need to
> download ~ 200kB of data to get the first block
>
> a.
>
> -- 
> Andrea Mennucc

Compared to 2-3MB that isn't bad. One probably should use only gzip
for that method though for this reason.

MfG
        Goswin



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Matt Taggart <taggart@debian.org>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #75 received at 372712@bugs.debian.org (full text, mbox):

From: Matt Taggart <taggart@debian.org>
To: 372712@bugs.debian.org
Subject: apt: periodically roll up pdiffs
Date: Thu, 27 Jul 2006 02:39:49 -0700
I had a similar idea as Andrea Mennucc mentions in #372712 for the problem of 
so many pdiffs. The idea is similar to a scheme you might use for nightly 
incremental backups. You might run a "zero" backup once a month, a "one" 
backup every 15 days, a "two" every 7, a "three" every 3 and a "four" every 
day". For example:

 July 2006           Aug 2006
            0        0 4 4 3 2
4 4 3 4 4 3 2    4 3 4 4 3 4 2
4 4 3 4 4 3 2    3 4 4 1 4 4 2
1 3 4 4 3 4 2    4 4 3 4 4 3 2
3 4 4 3 4 4 2    4 3 4 4 1
4 1


On any given day you'd need at most 5 patches and many days far less than 
that.  The reason for doing this is not just to reduce the number of files, 
but the overall data, as a lot of the data in the diff is redundant. Consider 
the case of a package that is updated every day for a month. Under the current 
scheme a client not updating for that month would need to download the 
differences for that package 30 times right? Under an incremental scheme the 
worst case is 5 diffs for that package. It's an even bigger win for longer 
periods of time, the current scheme will start really falling down once we get 
a few more months of pdiffs.

Thanks,

-- 
Matt Taggart
taggart@debian.org





Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #80 received at 372712@bugs.debian.org (full text, mbox):

From: Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>
To: Matt Taggart <taggart@debian.org>
Cc: 372712@bugs.debian.org
Subject: Re: Bug#372712: apt: periodically roll up pdiffs
Date: Thu, 27 Jul 2006 13:57:20 +0200
Matt Taggart <taggart@debian.org> writes:

> I had a similar idea as Andrea Mennucc mentions in #372712 for the problem of 
> so many pdiffs. The idea is similar to a scheme you might use for nightly 
> incremental backups. You might run a "zero" backup once a month, a "one" 
> backup every 15 days, a "two" every 7, a "three" every 3 and a "four" every 
> day". For example:
>
>  July 2006           Aug 2006
>             0        0 4 4 3 2
> 4 4 3 4 4 3 2    4 3 4 4 3 4 2
> 4 4 3 4 4 3 2    3 4 4 1 4 4 2
> 1 3 4 4 3 4 2    4 4 3 4 4 3 2
> 3 4 4 3 4 4 2    4 3 4 4 1
> 4 1
>
>
> On any given day you'd need at most 5 patches and many days far less than 
> that.  The reason for doing this is not just to reduce the number of files, 
> but the overall data, as a lot of the data in the diff is redundant. Consider 
> the case of a package that is updated every day for a month. Under the current 
> scheme a client not updating for that month would need to download the 
> differences for that package 30 times right? Under an incremental scheme the 
> worst case is 5 diffs for that package. It's an even bigger win for longer 
> periods of time, the current scheme will start really falling down once we get 
> a few more months of pdiffs.
>
> Thanks,

But then again why have incremental diffs at all?

2 patches can be merged by using a file with enough uniqe lines, apply
both patches, diff again. No need to work off the actual Packages
file, they don't have to be stored for this.

It is true that for every day the patch files will all grow (- the
packages with multiple updates in that time) but they aren't so big
and compression gets better for larger files.


Given the crawling speed of the rred method downloading more than a
few days (~300k) worth of patches is slower than the full file (3Mb)
even on a slow dsl line. A combined patch would only use one download,
one gunzip and one rred run. I think that would be worth the space
increase for the patch files.

I would recommend to name the combined patch files after the md5sum
(or sha1) of the Packages/Sources file they patch. That way no index
needs to be downloaded.

MfG
        Goswin

-----------------------------------------------------------------------
Sizes for combined patches:

-rw-r--r--  1 reprepro nogroup 26K Jul 27 13:55 comb.2006-07-26-1318.02.gz
-rw-r--r--  1 reprepro nogroup 54K Jul 27 13:55 comb.2006-07-25-1313.19.gz
-rw-r--r--  1 reprepro nogroup 90K Jul 27 13:55 comb.2006-07-24-1338.19.gz
-rw-r--r--  1 reprepro nogroup 132K Jul 27 13:55 comb.2006-07-24-0235.54.gz
-rw-r--r--  1 reprepro nogroup 170K Jul 27 13:55 comb.2006-07-22-1308.51.gz
-rw-r--r--  1 reprepro nogroup 186K Jul 27 13:55 comb.2006-07-21-1255.40.gz
-rw-r--r--  1 reprepro nogroup 206K Jul 27 13:55 comb.2006-07-20-1302.38.gz
-rw-r--r--  1 reprepro nogroup 226K Jul 27 13:56 comb.2006-07-19-1301.33.gz
-rw-r--r--  1 reprepro nogroup 246K Jul 27 13:56 comb.2006-07-18-1311.49.gz
-rw-r--r--  1 reprepro nogroup 289K Jul 27 13:56 comb.2006-07-17-1328.22.gz
-rw-r--r--  1 reprepro nogroup 332K Jul 27 13:56 comb.2006-07-16-2314.28.gz
-rw-r--r--  1 reprepro nogroup 351K Jul 27 13:57 comb.2006-07-15-1308.02.gz
-rw-r--r--  1 reprepro nogroup 370K Jul 27 13:57 comb.2006-07-14-1250.45.gz
-rw-r--r--  1 reprepro nogroup 392K Jul 27 13:57 comb.2006-07-13-1257.25.gz
-rw-r--r--  1 reprepro nogroup 424K Jul 27 13:57 comb.2006-07-12-1242.39.gz
-rw-r--r--  1 reprepro nogroup 443K Jul 27 13:58 comb.2006-07-11-1246.14.gz
-rw-r--r--  1 reprepro nogroup 462K Jul 27 13:58 comb.2006-07-10-1321.18.gz
-rw-r--r--  1 reprepro nogroup 495K Jul 27 13:58 comb.2006-07-10-0029.06.gz
-rw-r--r--  1 reprepro nogroup 538K Jul 27 13:59 comb.2006-07-08-1242.03.gz
-rw-r--r--  1 reprepro nogroup 547K Jul 27 13:59 comb.2006-07-07-1233.30.gz



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Jari Aalto <jari.aalto@cante.net>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #85 received at 372712@bugs.debian.org (full text, mbox):

From: Jari Aalto <jari.aalto@cante.net>
To: Debian Bug Tracking System <372712@bugs.debian.org>
Subject: apt: pdiff - does not work for long upgrade periods (need thresholds)
Date: Thu, 05 Oct 2006 15:52:33 +0300
Package: apt
Version: 0.6.44.2
Followup-For: Bug #372712

As the other reporter already expressed:

   - The .pdiff concept is very good for regular updates
   - *BUT* the .pdiff performs poorly if update periods
     are not frequent

And example

   - After 4 months, kick in "apt-get update"
   - It takes hours for .pdiff to complete on modem line

SUGGESTION

Add more intelligence to apt so that it selects the
correct methods based on various factors: 

  - if (a) enough days have passed or/and 
  - (b) enough packages have changed

On those conditions, the threshold settings (preferrable
user configurable), would use normal *.deb download method.
The *.pdiff would be used only when the timeframe of previous
update is near enough.

Or, apt could also calculate the count/size of *.pdiff's needed
to get the update done and determine if that would be "too much"
and revert to regular *.deb download.

-- Package-specific info:

-- (no /etc/apt/preferences present) --


-- (/etc/apt/sources.list present, but not submitted) --


-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/dash
Kernel: Linux 2.6.12-1-686
Locale: LANG=C, LC_CTYPE=C (charmap=ISO-8859-1) (ignored: LC_ALL set to en_US)

Versions of packages apt depends on:
ii  libc6                         2.3.6-15   GNU C Library: Shared libraries
ii  libgcc1                       1:4.1.1-8  GCC support library
ii  libstdc++6                    4.1.1-8    The GNU Standard C++ Library v3

Versions of packages apt recommends:
pn  debian-archive-keyring        <none>     (no description available)

-- no debconf information



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Aurélien Le Provost - Ribaltchenko <aurelien@aurelp.fr.eu.org>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #90 received at 372712@bugs.debian.org (full text, mbox):

From: Aurélien Le Provost - Ribaltchenko <aurelien@aurelp.fr.eu.org>
To: 372712@bugs.debian.org
Cc: julian@mehnle.net, mvo@debian.org
Subject: Re: Bug#372712: apt: Shouldn't download pdiffs in all cases
Date: Wed, 22 Nov 2006 02:52:29 +0100
Hello,

I don't keep my Debian testing uptodate and it's a fact than the pdiff 
method is really longer. After few days (a week ?) make a day-by-day 
summary is more expansive than get directly the new Packages file ; 
after a month it's catastrophic.

>> It is possible to disable the pdiff files with:
>> apt-get update -o Acquire::PDiffs=false
>
> I have disabled Apt's pdiff "feature" for now.

Can I ask you if you do this on the command line each time (or with an 
alias) or we can archive this option in a /etc/apt/* file ?

Regards,
Aurélien.

PS : I see many informations ( System Information, Versions of packages 
apt depends on, Versions of packages apt recommends, etc) on many 
Debian bugreports. Does they made manually or by a script ?

-- 
http://www.aurelp.fr.eu.org
« Celui qui néglige le possible pour chercher l'impossible est un
insensé. » Von Clausewitz, Vom kriege.



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Sam Morris <sam@robots.org.uk>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #95 received at 372712@bugs.debian.org (full text, mbox):

From: Sam Morris <sam@robots.org.uk>
To: Aurélien Le Provost - Ribaltchenko <aurelien@aurelp.fr.eu.org>, 372712@bugs.debian.org
Cc: julian@mehnle.net, mvo@debian.org
Subject: Re: Bug#372712: apt: Shouldn't download pdiffs in all cases
Date: Wed, 22 Nov 2006 02:18:02 +0000
[Message part 1 (text/plain, inline)]
On Wed, 2006-11-22 at 02:52 +0100, Aurélien Le Provost - Ribaltchenko
wrote:
> >> It is possible to disable the pdiff files with:
> >> apt-get update -o Acquire::PDiffs=false
> >
> > I have disabled Apt's pdiff "feature" for now.
> 
> Can I ask you if you do this on the command line each time (or with an 
> alias) or we can archive this option in a /etc/apt/* file ?

Sure:

        Acquire { Pdiffs false; };

or 

        Acquire::Pdiffs false;

in /etc/apt/apt.conf. See the man page for apt.conf for more details.

> Regards,
> Aurélien.
> 
> PS : I see many informations ( System Information, Versions of packages 
> apt depends on, Versions of packages apt recommends, etc) on many 
> Debian bugreports. Does they made manually or by a script ?

They are generated by the 'reportbug' tool. Usually this is done when a
new bug is reported, but you can use 'reportbug' to followup to an
existing bug report by starting to report a new bug against the same
package, then picking the target bug from the list of already-reported
bugs.

-- 
Sam Morris
http://robots.org.uk/

PGP key id 1024D/5EA01078
3412 EA18 1277 354B 991B  C869 B219 7FDB 5EA0 1078
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to russell@coker.com.au:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #100 received at 372712@bugs.debian.org (full text, mbox):

From: Russell Coker <russell@coker.com.au>
To: 372712@bugs.debian.org
Subject: not just daily updates?
Date: Sat, 9 Dec 2006 11:44:24 +1100
I suspect that part of the problem is the large number of updates.  Have there 
been any tests on having daily and weekly updates?  If I have not updated for 
13 days then there is going to be a complete calendar week in that time 
period, so the download would be one weekly pdiff and 6 daily diffs.

As the download speed appears to be based on the number of diffs not their 
size this should decrease the time significantly.

Another possibility might be to have differnt pdiff files generated based on 
powers of two.  So you have a directory of daily diffs, a directory for every 
second day, a directory for every four days, and on until 128 or 256 day 
diffs.  That would mean that the maximum number of diffs needed would be the 
log of the number of days base 2.  It would also be easy for modem users to 
arrange their downloads on days that are a multiple of 16 to minimise 
downloads.



Information forwarded to debian-bugs-dist@lists.debian.org, APT Development Team <deity@lists.debian.org>:
Bug#372712; Package apt. Full text and rfc822 format available.

Acknowledgement sent to Matthijs Kooijman <m.kooijman@student.utwente.nl>:
Extra info received and forwarded to list. Copy sent to APT Development Team <deity@lists.debian.org>. Full text and rfc822 format available.

Message #105 received at 372712@bugs.debian.org (full text, mbox):

From: Matthijs Kooijman <m.kooijman@student.utwente.nl>
To: 372712@bugs.debian.org
Subject: Max-Pdiffs option
Date: Wed, 27 Jun 2007 09:23:37 +0200
Hey,

I would propose a simple solution: Have a Acquire::Max-Pdiffs option. This is
the number of pdiffs that will be downloaded, if more are required, a full
download is done instead. The exact value would be dependent on the actual
connection, but it should be possible to specify a decent default. People with
really fast internet connections can tweak this option themselves (if it gets
documented, see #376029).

Also, this option might be used to replace Acquire::Pdiffs, for Max-Pdiffs = 0
would be always download full files, a special value "all" could be used to
always do Pdiffs. Later, a special value "auto" might be added when there is
some autodetecting mechanism that takes into account connection speed from
previous updates.

Gr.

Matthijs



Reply sent to Daniel Hartwig <mandyke@gmail.com>:
You have taken responsibility. (Thu, 17 May 2012 06:48:13 GMT) Full text and rfc822 format available.

Notification sent to Mike Hommey <mh+reportbug@glandium.org>:
Bug acknowledged by developer. (Thu, 17 May 2012 06:48:13 GMT) Full text and rfc822 format available.

Message #110 received at 372712-done@bugs.debian.org (full text, mbox):

From: Daniel Hartwig <mandyke@gmail.com>
To: 372712-done@bugs.debian.org
Subject: Bug#372712: apt: Shouldn't download pdiffs in all cases
Date: Thu, 17 May 2012 14:47:32 +0800
Version: 0.7.25.1

This has been configurable for some time:

# never use pdiffs:
Acquire::PDiffs "0";

# download up to 5 pdiffs:
Acquire::PDiffs::FileLimit "5";


From the changelog adding the file and size limits:

 apt (0.7.25.1) unstable; urgency=low

  * apt-pkg/acquire-item.cc:
    - add configuration PDiffs::Limit-options to not download
      too many or too big patches (Closes: #554349)

and from apt.conf(5):

  PDiffs
     Try to download deltas called PDiffs for Packages or
     Sources files instead of downloading whole ones. True
     by default.

     Two sub-options to limit the use of PDiffs are also
     available: With FileLimit can be specified how many PDiff
     files are downloaded at most to patch a file.  SizeLimit
     on the other hand is the maximum precentage of the size
     of all patches compared to the size of the targeted file.
     If one of these limits is exceeded the complete file is
     downloaded instead of the patches.




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 14 Jun 2012 07:35:42 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Mon Apr 21 07:59:28 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.