Debian Bug report logs - #455082
enable more efficient batching

version graph

Package: debmirror; Maintainer for debmirror is Colin Watson <cjwatson@debian.org>; Source for debmirror is src:debmirror (PTS, buildd, popcon).

Reported by: Kees Cook <kees@outflux.net>

Date: Sat, 8 Dec 2007 23:03:01 UTC

Severity: normal

Tags: patch

Found in version debmirror/20070123

Reply or subscribe to this bug.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>:
Bug#455082; Package debmirror. (full text, mbox, link).


Acknowledgement sent to Kees Cook <kees@outflux.net>:
New Bug report received and forwarded. Copy sent to Goswin von Brederlow <brederlo@informatik.uni-tuebingen.de>. (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Kees Cook <kees@outflux.net>
To: Debian Bugs <submit@bugs.debian.org>
Subject: enable more efficient batching
Date: Sat, 8 Dec 2007 15:01:23 -0800
[Message part 1 (text/plain, inline)]
Package: debmirror
Version: 20070123
Severity: normal
Tags: patch

Hello!  This is a rather drastic rewrite of the bulk of the core logic
in debmirror to allow it to do more batched downloads, which makes
things much faster when using rsync.

Mostly what I did was break up the downloading and the verification
passes (they were in the same loop).  Now it will download everything it
needs, then goes back and checks their state.

This could probably use more cleanups, but hopefully it makes a good
starting point.  I didn't want to take it much further without getting
some feedback first.  :)

Thanks for a great mirroring tool!

-Kees

-- 
Kees Cook                                            @outflux.net
[debmirror-batching-change.patch (text/x-diff, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Frans Pop <fjp@debian.org>:
Bug#455082; Package debmirror. (Fri, 28 Aug 2009 19:57:04 GMT) (full text, mbox, link).


Acknowledgement sent to 455082@bugs.debian.org:
Extra info received and forwarded to list. Copy sent to Frans Pop <fjp@debian.org>. (Fri, 28 Aug 2009 19:57:04 GMT) (full text, mbox, link).


Message #10 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Frans Pop <elendil@planet.nl>
To: 455082@bugs.debian.org
Cc: Kees Cook <kees@debian.org>
Subject: Bug#455082: enable more efficient batching
Date: Fri, 28 Aug 2009 21:48:46 +0200
Kees Cook:
> Hello!  This is a rather drastic rewrite of the bulk of the core logic
> in debmirror to allow it to do more batched downloads, which makes
> things much faster when using rsync.

Hi Kees,

JFYI.

I've been looking at that part of the code myself too and there are a few 
bug reports open in that area as well (#389894, #542590). This will 
probably be my next project. Some of my own ideas seem to match what 
you've done in this patch.

I've pulled the patch into my local git repo so that I can study it more 
easily. I've already done some restructuring of my own which means your 
patch needs a major update, but the basic idea should still apply.
I'll probably use it as a major source of inspiration for my own eventual 
implementation.

Cheers,
FJP




Information forwarded to debian-bugs-dist@lists.debian.org, Frans Pop <fjp@debian.org>:
Bug#455082; Package debmirror. (Fri, 28 Aug 2009 20:18:07 GMT) (full text, mbox, link).


Acknowledgement sent to Kees Cook <kees@debian.org>:
Extra info received and forwarded to list. Copy sent to Frans Pop <fjp@debian.org>. (Fri, 28 Aug 2009 20:18:07 GMT) (full text, mbox, link).


Message #15 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Kees Cook <kees@debian.org>
To: 455082@bugs.debian.org
Subject: Re: Bug#455082: enable more efficient batching
Date: Fri, 28 Aug 2009 13:05:35 -0700
On Fri, Aug 28, 2009 at 09:48:46PM +0200, Frans Pop wrote:
> I've been looking at that part of the code myself too and there are a few 
> bug reports open in that area as well (#389894, #542590). This will 
> probably be my next project. Some of my own ideas seem to match what 
> you've done in this patch.
> 
> I've pulled the patch into my local git repo so that I can study it more 
> easily. I've already done some restructuring of my own which means your 
> patch needs a major update, but the basic idea should still apply.
> I'll probably use it as a major source of inspiration for my own eventual 
> implementation.

Excellent!  Let me know if I can help test or if you have any questions
about that patch -- it's a bit big/ugly.  :P

Thanks!

-Kees

-- 
Kees Cook                                            @debian.org




Information forwarded to debian-bugs-dist@lists.debian.org, Frans Pop <fjp@debian.org>:
Bug#455082; Package debmirror. (Sat, 12 Dec 2009 09:27:03 GMT) (full text, mbox, link).


Acknowledgement sent to Kees Cook <kees@debian.org>:
Extra info received and forwarded to list. Copy sent to Frans Pop <fjp@debian.org>. (Sat, 12 Dec 2009 09:27:03 GMT) (full text, mbox, link).


Message #20 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Kees Cook <kees@debian.org>
To: 455082@bugs.debian.org
Subject: patch splitting
Date: Sat, 12 Dec 2009 01:18:23 -0800
[Message part 1 (text/plain, inline)]
Hi!

I've started to split up some of the various changes in my monolithic
original rsync-batching patch.  I haven't done the final batching logic
changes yet, but here is a stack of patches for various incremental
improvements:

default-settings.patch
silence-errors.patch
functionalize.patch
drop-redundant-rsync.patch
check_file-return.patch
command-exit-checking.patch

I'll continue working on the main portion of the batching logic.

Thanks!

-Kees

-- 
Kees Cook                                            @debian.org
[default-settings.patch (text/x-diff, attachment)]
[silence-errors.patch (text/x-diff, attachment)]
[functionalize.patch (text/x-diff, attachment)]
[drop-redundant-rsync.patch (text/x-diff, attachment)]
[check_file-return.patch (text/x-diff, attachment)]
[command-exit-checking.patch (text/x-diff, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Frans Pop <fjp@debian.org>:
Bug#455082; Package debmirror. (Mon, 05 Apr 2010 19:48:09 GMT) (full text, mbox, link).


Acknowledgement sent to Kees Cook <kees@debian.org>:
Extra info received and forwarded to list. Copy sent to Frans Pop <fjp@debian.org>. (Mon, 05 Apr 2010 19:48:09 GMT) (full text, mbox, link).


Message #25 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Kees Cook <kees@debian.org>
To: 455082@bugs.debian.org
Subject: updates
Date: Mon, 5 Apr 2010 12:35:19 -0700
[Message part 1 (text/plain, inline)]
Hello,

This set of patches has been updated to apply to debmirror 1:2.4.4.

(Still haven't reworked the core batching changes, but these are all
useful on their own...)

Thanks!

-Kees

-- 
Kees Cook                                            @debian.org
[default-settings.patch (text/x-diff, attachment)]
[silence-errors.patch (text/x-diff, attachment)]
[functionalize.patch (text/x-diff, attachment)]
[drop-redundant-rsync.patch (text/x-diff, attachment)]
[check_file-return.patch (text/x-diff, attachment)]
[command-exit-checking.patch (text/x-diff, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Frans Pop <fjp@debian.org>:
Bug#455082; Package debmirror. (Sun, 11 Apr 2010 18:33:03 GMT) (full text, mbox, link).


Acknowledgement sent to Kees Cook <kees@debian.org>:
Extra info received and forwarded to list. Copy sent to Frans Pop <fjp@debian.org>. (Sun, 11 Apr 2010 18:33:03 GMT) (full text, mbox, link).


Message #30 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Kees Cook <kees@debian.org>
To: 455082@bugs.debian.org
Subject: updated patch
Date: Sun, 11 Apr 2010 11:30:23 -0700
[Message part 1 (text/plain, inline)]
Updated "drop-redundant-rsync" patch attached...

-- 
Kees Cook                                            @debian.org
[drop-redundant-rsync.patch (text/x-diff, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Joey Hess <joeyh@debian.org>:
Bug#455082; Package debmirror. (Mon, 06 Sep 2010 22:54:03 GMT) (full text, mbox, link).


Acknowledgement sent to Joey Hess <joey@kitenet.net>:
Extra info received and forwarded to list. Copy sent to Joey Hess <joeyh@debian.org>. (Mon, 06 Sep 2010 22:54:04 GMT) (full text, mbox, link).


Message #35 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Joey Hess <joey@kitenet.net>
To: Kees Cook <kees@outflux.net>
Cc: 455082@bugs.debian.org
Subject: debmirror patches
Date: Mon, 6 Sep 2010 18:51:00 -0400
[Message part 1 (text/plain, inline)]
I'm looking over the stack of patches you sent for debmirror earlier.

default-settings.patch:

* rsync batching size: I see no problem with it, but rsync
  batching was added after I wrote debmirror, and I don't
  understand why it's needed anyway
* no motd: probably ok; the motd could perhaps be displayed
  on the first rsync call and not subsequent ones
* removing -I flag: Was also added after I wrote debmirror.
  Seems ok to remove to me, but I am curious how it's "troublesome".
* stdout auto-flushing: only needed if something writes
  to stdout without \n, which nothing seems to?

silence-errors.patch:

Before silencing any errors I always like to think about
how the error could occur. So, is there any case where the find
calls fail with an error, that is not itself an error?

functionalize.patch:

applied to my 'patchy' branch in my debmirror git repo
(git://git.debian.org/collab-maint/debmirror.git)

drop-redundant-rsync.patch:

Seems probably ok, but I think I need to understand why the rsync
code is so complicated by batching, etc first.

check_file-return.patch

Won't this result in a *lot* of spew about "Missing: $file"
when starting a fresh mirror in verbose mode?

command-exit-checking.patch

applied to my 'patchy' branch in my debmirror git repo.

-- 
see shy jo
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Joey Hess <joeyh@debian.org>:
Bug#455082; Package debmirror. (Wed, 08 Sep 2010 17:54:12 GMT) (full text, mbox, link).


Acknowledgement sent to Kees Cook <kees@debian.org>:
Extra info received and forwarded to list. Copy sent to Joey Hess <joeyh@debian.org>. (Wed, 08 Sep 2010 17:54:12 GMT) (full text, mbox, link).


Message #40 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Kees Cook <kees@debian.org>
To: Joey Hess <joey@kitenet.net>
Cc: 455082@bugs.debian.org
Subject: Re: debmirror patches
Date: Wed, 8 Sep 2010 10:51:47 -0700
Hi Joey,

On Mon, Sep 06, 2010 at 06:51:00PM -0400, Joey Hess wrote:
> I'm looking over the stack of patches you sent for debmirror earlier.
> 
> default-settings.patch:
> 
> * rsync batching size: I see no problem with it, but rsync
>   batching was added after I wrote debmirror, and I don't
>   understand why it's needed anyway

I'm not sure either, but it's possible that some servers may not allow
long-running connections.

> * no motd: probably ok; the motd could perhaps be displayed
>   on the first rsync call and not subsequent ones

True, but I suspect it has actually no real value -- the selected server is
already known to the user, etc.

> * removing -I flag: Was also added after I wrote debmirror.
>   Seems ok to remove to me, but I am curious how it's "troublesome".

Honestly, I've forgotten now, but I think it was missing files sometimes.

> * stdout auto-flushing: only needed if something writes
>   to stdout without \n, which nothing seems to?

This may have been related to debugging work I was doing. Or it was related
to trying to synchronize perl's output with subprocess output.

> silence-errors.patch:
> 
> Before silencing any errors I always like to think about
> how the error could occur. So, is there any case where the find
> calls fail with an error, that is not itself an error?

Yes, there are a number of cases where it's just looking for things
existing or not, and it would complain loudly for newly added repos, IIRC.

> functionalize.patch:
> 
> applied to my 'patchy' branch in my debmirror git repo
> (git://git.debian.org/collab-maint/debmirror.git)

Thanks!

> drop-redundant-rsync.patch:
> 
> Seems probably ok, but I think I need to understand why the rsync
> code is so complicated by batching, etc first.

Sure, though I think this is a pretty clear win for simplifying that loop.

> check_file-return.patch
> 
> Won't this result in a *lot* of spew about "Missing: $file"
> when starting a fresh mirror in verbose mode?

Yes, but that's what you'd want in verbose mode, right?

> command-exit-checking.patch
> 
> applied to my 'patchy' branch in my debmirror git repo.

Thanks!

-Kees

-- 
Kees Cook                                            @debian.org




Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#455082; Package debmirror. (Thu, 09 Sep 2010 18:30:26 GMT) (full text, mbox, link).


Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. (Thu, 09 Sep 2010 18:30:26 GMT) (full text, mbox, link).


Message #45 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Joey Hess <joeyh@debian.org>
To: Kees Cook <kees@debian.org>, 455082@bugs.debian.org
Subject: Re: Bug#455082: debmirror patches
Date: Thu, 9 Sep 2010 14:29:33 -0400
[Message part 1 (text/plain, inline)]
Kees Cook wrote:
> > * rsync batching size: I see no problem with it, but rsync
> >   batching was added after I wrote debmirror, and I don't
> >   understand why it's needed anyway
> 
> I'm not sure either, but it's possible that some servers may not allow
> long-running connections.

Guess so. It could reconnect if a sever dropped a connection though.
May need to ask mrvn, who added it, for the reason.

> > * no motd: probably ok; the motd could perhaps be displayed
> >   on the first rsync call and not subsequent ones
> 
> True, but I suspect it has actually no real value -- the selected server is
> already known to the user, etc.

I guess two things I've seen rsync motds used for are
a) this is the cool entity who provides this bandwidth
b) actual policy / news stuff for the server

Maybe it should only be shown, once, in verbose mode?

> > * removing -I flag: Was also added after I wrote debmirror.
> >   Seems ok to remove to me, but I am curious how it's "troublesome".
> 
> Honestly, I've forgotten now, but I think it was missing files sometimes.

Well, -I *avoids* skipping files that have the same size and time, so
*removing* it could, in theory, cause bad files to be kept, but I don't
see how keeping it and doing more checks could.

> > Before silencing any errors I always like to think about
> > how the error could occur. So, is there any case where the find
> > calls fail with an error, that is not itself an error?
> 
> Yes, there are a number of cases where it's just looking for things
> existing or not, and it would complain loudly for newly added repos, IIRC.

Hmm, I have been running debmirror for a week building a new repo, and
have not seen a find error yet..

> > check_file-return.patch
> > 
> > Won't this result in a *lot* of spew about "Missing: $file"
> > when starting a fresh mirror in verbose mode?
> 
> Yes, but that's what you'd want in verbose mode, right?

I dunno, all I want in verbose mode is an indication of what it's
downloading, and maybe some progress info. I can guess that it's
downloading the file because it's not present, that seems to obvious to
say.

(I did like the part of the patch that said when it had to download an existing
file for some other reason.)

-- 
see shy jo
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Joey Hess <joeyh@debian.org>:
Bug#455082; Package debmirror. (Thu, 09 Sep 2010 18:39:03 GMT) (full text, mbox, link).


Acknowledgement sent to Kees Cook <kees@debian.org>:
Extra info received and forwarded to list. Copy sent to Joey Hess <joeyh@debian.org>. (Thu, 09 Sep 2010 18:39:03 GMT) (full text, mbox, link).


Message #50 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Kees Cook <kees@debian.org>
To: Joey Hess <joeyh@debian.org>
Cc: 455082@bugs.debian.org
Subject: Re: Bug#455082: debmirror patches
Date: Thu, 9 Sep 2010 11:38:22 -0700
On Thu, Sep 09, 2010 at 02:29:33PM -0400, Joey Hess wrote:
> Kees Cook wrote:
> > > * no motd: probably ok; the motd could perhaps be displayed
> > >   on the first rsync call and not subsequent ones
> > 
> > True, but I suspect it has actually no real value -- the selected server is
> > already known to the user, etc.
> 
> I guess two things I've seen rsync motds used for are
> a) this is the cool entity who provides this bandwidth
> b) actual policy / news stuff for the server
> 
> Maybe it should only be shown, once, in verbose mode?

Yeah, sounds about right.

> > > Before silencing any errors I always like to think about
> > > how the error could occur. So, is there any case where the find
> > > calls fail with an error, that is not itself an error?
> > 
> > Yes, there are a number of cases where it's just looking for things
> > existing or not, and it would complain loudly for newly added repos, IIRC.
> 
> Hmm, I have been running debmirror for a week building a new repo, and
> have not seen a find error yet..

Well, let's ignore this patch for now.

> > > check_file-return.patch
> > > 
> > > Won't this result in a *lot* of spew about "Missing: $file"
> > > when starting a fresh mirror in verbose mode?
> > 
> > Yes, but that's what you'd want in verbose mode, right?
> 
> I dunno, all I want in verbose mode is an indication of what it's
> downloading, and maybe some progress info. I can guess that it's
> downloading the file because it's not present, that seems to obvious to
> say.

Fair enough; I think I made this change to help debug the "proper" batching
rsync patch that I haven't reworked against the new code yet.

I'll wait for another stable debmirror release, and try to port the rsync
patches I had for fetching all the Release/Packages/etc files in one go
instead of fetching them individually like it does at the moment.

Thanks!

-Kees

-- 
Kees Cook                                            @debian.org




Information forwarded to debian-bugs-dist@lists.debian.org:
Bug#455082; Package debmirror. (Sun, 26 Sep 2010 20:33:03 GMT) (full text, mbox, link).


Acknowledgement sent to Joey Hess <joeyh@debian.org>:
Extra info received and forwarded to list. (Sun, 26 Sep 2010 20:33:03 GMT) (full text, mbox, link).


Message #55 received at 455082@bugs.debian.org (full text, mbox, reply):

From: Joey Hess <joeyh@debian.org>
To: Kees Cook <kees@outflux.net>
Cc: 455082@bugs.debian.org
Subject: Re: debmirror patches
Date: Sun, 26 Sep 2010 16:30:24 -0400
[Message part 1 (text/plain, inline)]
Joey Hess wrote:
> * no motd: probably ok; the motd could perhaps be displayed
>   on the first rsync call and not subsequent ones

Deep in #553604, it's mentioned that some rsync mirrors are known
to fail if rsync is run with --no-motd. This makes me leery of
this patch, although the mirror mentioned, ftp.nl.debian.org,
seems to work now with --no-motd.

-- 
see shy jo
[signature.asc (application/pgp-signature, inline)]

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Jan 7 04:18:33 2018; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.