Debian Bug report logs - #877418
dh-strip-nondeterminism: kills clojure performance

version graph

Package: dh-strip-nondeterminism; Maintainer for dh-strip-nondeterminism is Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>; Source for dh-strip-nondeterminism is src:strip-nondeterminism (PTS, buildd, popcon).

Reported by: Rob Browning <rlb@defaultvalue.org>

Date: Sun, 1 Oct 2017 15:51:01 UTC

Severity: normal

Tags: confirmed, moreinfo

Found in version strip-nondeterminism/0.034-1

Done: Rob Browning <rlb@defaultvalue.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Sun, 01 Oct 2017 15:51:04 GMT) (full text, mbox, link).


Acknowledgement sent to Rob Browning <rlb@defaultvalue.org>:
New Bug report received and forwarded. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Sun, 01 Oct 2017 15:51:04 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Rob Browning <rlb@defaultvalue.org>
To: submit@bugs.debian.org
Subject: dh-strip-nondeterminism: kills clojure performance
Date: Sun, 01 Oct 2017 10:50:21 -0500
Package: dh-strip-nondeterminism
Version: 0.034-1

I noticed that Debian's clojure-1.8.0.jar had terrible performance as
compared to both the upstream jar and one built manually via the "mvn
package" or ant process, and after some investigation, I think I've
tracked it down to dh-strip-nondeterminism.

Given the current clojure 1.8.0-2 source tree, adding this to
debian/rules:

  # Ask clojure to do nothing
  define timeclj
    time java -cp debian/libclojure-java/usr/share/java/clojure-1.8.0.jar \
      clojure.main -e ''
  endef

  override_dh_strip_nondeterminism:
          $(timeclj)
          dh_strip_nondeterminism
          $(timeclj)

and then running "fakeroot debian/rules binary" produces this:

  time java -cp debian/libclojure-java/usr/share/java/clojure-1.8.0.jar clojure.main -e ''

  real    0m0.919s
  user    0m1.739s
  sys     0m0.064s
  dh_strip_nondeterminism
  time java -cp debian/libclojure-java/usr/share/java/clojure-1.8.0.jar clojure.main -e ''

  real    0m4.064s
  user    0m12.204s
  sys     0m0.140s

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Mon, 02 Oct 2017 10:42:05 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Mon, 02 Oct 2017 10:42:05 GMT) (full text, mbox, link).


Message #10 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Rob Browning <rlb@defaultvalue.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Mon, 02 Oct 2017 11:40:29 +0100
tags 877418 + confirmed
thanks

Hi Rob,

> I noticed that Debian's clojure-1.8.0.jar had terrible performance as
> compared to both the upstream jar

Oh boy, this sounds fun!

I can confirm it is due to strip-nondeterminism. In particular, the
part that sets the last modified date of the .jar contents (!):

  --- a/lib/File/StripNondeterminism/handlers/zip.pm
  +++ b/lib/File/StripNondeterminism/handlers/zip.pm
  @@ -198,8 +198,6 @@ sub normalize {
   		$zip->addMember($member);
   		$options{member_normalizer}->($member)
   		  if exists $options{member_normalizer};
  -		$member->setLastModFileDateTimeFromUnix(
  -			$File::StripNondeterminism::canonical_time // SAFE_EPOCH);
   		if ($member->fileAttributeFormat() == FA_UNIX) {
   			$member->unixFileAttributes(
   				($member->unixFileAttributes() & oct(100))

Applying this hunk removes the observed performance regression entirely,
despite it altering the .jar (different sha1sum, etc.).

What might be a useful/relevant detail here is that if I apply the following
diff, *clamping* the time rather than always setting it:

  --- a/lib/File/StripNondeterminism/handlers/zip.pm
  +++ b/lib/File/StripNondeterminism/handlers/zip.pm
  @@ -198,8 +198,9 @@ sub normalize {
   		$zip->addMember($member);
   		$options{member_normalizer}->($member)
   		  if exists $options{member_normalizer};
  -		$member->setLastModFileDateTimeFromUnix(
  -			$File::StripNondeterminism::canonical_time // SAFE_EPOCH);
  +		my $canonical_time = $File::StripNondeterminism::canonical_time // SAFE_EPOCH;
  +		$member->setLastModFileDateTimeFromUnix($canonical_time)
  +		  if $member->lastModTime() > $canonical_time;
   		if ($member->fileAttributeFormat() == FA_UNIX) {
   			$member->unixFileAttributes(
   				($member->unixFileAttributes() & oct(100))

… I get about a 25% performance regression:

 1.23s user 0.06s system 191% cpu 0.673 total
 2.08s user 0.09s system 231% cpu 0.940 total

Also, setting $canonical_time far in the future results in zero
performance regression again.

This makes no sense whatsoever unless, perhaps, Java is ignoring .class
files at runtime based on their modification date compared to the current
time...?


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Mon, 02 Oct 2017 10:51:04 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Mon, 02 Oct 2017 10:51:05 GMT) (full text, mbox, link).


Message #15 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: 877418@bugs.debian.org
Cc: Rob Browning <rlb@defaultvalue.org>
Subject: Re: dh-strip-nondeterminism: kills clojure performance
Date: Mon, 02 Oct 2017 11:48:05 +0100
tags 877418 + confirmed
thanks


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Added tag(s) confirmed. Request was from Chris Lamb <lamby@debian.org> to control@bugs.debian.org. (Mon, 02 Oct 2017 10:51:10 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Mon, 02 Oct 2017 11:03:09 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Mon, 02 Oct 2017 11:03:09 GMT) (full text, mbox, link).


Message #22 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: 877418@bugs.debian.org, Rob Browning <rlb@defaultvalue.org>
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Mon, 02 Oct 2017 12:00:36 +0100
Chris Lamb wrote:

> > I noticed that Debian's clojure-1.8.0.jar had terrible performance as
> > compared to both the upstream jar
> 
> Oh boy, this sounds fun!

There's no obvious reason at this point why this performance regression is
limited to Clojure, unless — hopefully — it's related to the .clj files?

ie. this could be affecting the performance of all Java applications
in Debian (!)


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Mon, 02 Oct 2017 22:39:03 GMT) (full text, mbox, link).


Acknowledgement sent to Emmanuel Bourg <emmanuel.bourg@gmail.com>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Mon, 02 Oct 2017 22:39:03 GMT) (full text, mbox, link).


Message #27 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Emmanuel Bourg <emmanuel.bourg@gmail.com>
To: 877418@bugs.debian.org
Cc: Chris Lamb <lamby@debian.org>
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Tue, 3 Oct 2017 00:34:46 +0200
On Mon, 02 Oct 2017 12:00:36 +0100 Chris Lamb <lamby@debian.org> wrote:

> There's no obvious reason at this point why this performance regression is
> limited to Clojure, unless — hopefully — it's related to the .clj files?
> 
> ie. this could be affecting the performance of all Java applications
> in Debian (!)

Hey having fun with a Java puzzle and not telling the Java Team? That's
mean ;)

I quickly investigated this, it looks like the .clj files bundled in
clojure.jar are recompiled every time clojure is invoked if the jar was
processed by strip-nondeterminism. My guess was that the .clj files are
recompiled if the associated .class file is older, but it also happens
if they have the same date. I eventually found this check performed in
the load() method of RT.java:

  if((classURL != null &&
      (cljURL == null
         || lastModified(classURL, classfile) > lastModified(cljURL,
scriptfile)))

Changing '>' with '>=' fixes the issue.

Emmanuel Bourg



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 03 Oct 2017 01:24:03 GMT) (full text, mbox, link).


Acknowledgement sent to Rob Browning <rlb@defaultvalue.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 03 Oct 2017 01:24:03 GMT) (full text, mbox, link).


Message #32 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Rob Browning <rlb@defaultvalue.org>
To: Chris Lamb <lamby@debian.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Mon, 02 Oct 2017 20:15:06 -0500
Chris Lamb <lamby@debian.org> writes:

> Chris Lamb wrote:
>
>> > I noticed that Debian's clojure-1.8.0.jar had terrible performance as
>> > compared to both the upstream jar
>> 
>> Oh boy, this sounds fun!
>
> There's no obvious reason at this point why this performance regression is
> limited to Clojure, unless — hopefully — it's related to the .clj files?
>
> ie. this could be affecting the performance of all Java applications
> in Debian (!)

I wondered if Clojure might be trying to be clever there, and...

  https://stackoverflow.com/questions/19594360/preserving-timestamps-on-clojure-clj-files-when-building-shaded-jar-via-maven-s

So maybe if you ensure the class files are newer than the .clj files?

Thanks for the help
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 03 Oct 2017 08:36:02 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 03 Oct 2017 08:36:02 GMT) (full text, mbox, link).


Message #37 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Emmanuel Bourg <emmanuel.bourg@gmail.com>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Tue, 03 Oct 2017 09:32:39 +0100
Hi Emmanuel,

> I eventually found this check performed in the load() method of RT.java:
> 
>   if((classURL != null &&
>       (cljURL == null
>          || lastModified(classURL, classfile) > lastModified(cljURL,
> scriptfile)))
> 
> Changing '>' with '>=' fixes the issue.

Great stuff! So, we have two options as I see it:

  a) We patch clojure with ">="  (and send it upstream, etc. etc.)

  b) We make strip-nondetermism subtract 1 second from the .clj files'
     target modification times so it matches with the existing ">".

My preference is for "a)", naturally...

> Hey having fun with a Java puzzle and not telling the Java Team? That's
> mean ;)

I was slightly scared we had broken Java performance throughout
Debian! *g*


Best wishes,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 03 Oct 2017 12:03:06 GMT) (full text, mbox, link).


Acknowledgement sent to Apollon Oikonomopoulos <apoikos@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 03 Oct 2017 12:03:06 GMT) (full text, mbox, link).


Message #42 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Apollon Oikonomopoulos <apoikos@debian.org>
To: 877418@bugs.debian.org, lamby@debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Tue, 3 Oct 2017 14:59:08 +0300
Hi Chris,

On Tue, 03 Oct 2017 09:32:39 +0100 Chris Lamb <lamby@debian.org> wrote:
> Hi Emmanuel,
> 
> > I eventually found this check performed in the load() method of RT.java:
> > 
> >   if((classURL != null &&
> >       (cljURL == null
> >          || lastModified(classURL, classfile) > lastModified(cljURL,
> > scriptfile)))
> > 
> > Changing '>' with '>=' fixes the issue.
> 
> Great stuff! So, we have two options as I see it:
> 
>   a) We patch clojure with ">="  (and send it upstream, etc. etc.)
> 
>   b) We make strip-nondetermism subtract 1 second from the .clj files'
>      target modification times so it matches with the existing ">".
> 
> My preference is for "a)", naturally...

I'm afraid a) is not the correct solution here. If you want to make sure 
that the bytecode is strictly newer than the source, you *have* to 
re-compile if they have the same mtime. This is especially true when 
taking into account that the mtime resolution is finite (and pretty 
coarse indeed in cases like ext3). Setting the mtime of .clj files one 
second earlier than .class should Do The Right Thing™.

Cheers,
Apollon



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 03 Oct 2017 14:03:08 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 03 Oct 2017 14:03:08 GMT) (full text, mbox, link).


Message #47 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: 877418@bugs.debian.org
Cc: Rob Browning <rlb@defaultvalue.org>
Subject: Re: dh-strip-nondeterminism: kills clojure performance
Date: Tue, 03 Oct 2017 15:00:11 +0100
tags 877418 + pending
thanks

> Setting the mtime of .clj files one second earlier than .class
> should Do The Right Thing™.

Thanks. I've just pushed the following:

  https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=99af63bec965d924275d53f4db90f9853e4db8a7

  https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=3f92d1b3d5cfc7b9b82cec176b3e602d0a34fbaf

  https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=dec86231ce51db87d28db35fbedb9c887db569fd

  https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=7691e2980274c1b041b8730bae5a8f5374cbcbf1


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Added tag(s) pending. Request was from Chris Lamb <lamby@debian.org> to control@bugs.debian.org. (Tue, 03 Oct 2017 14:03:10 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 03 Oct 2017 14:27:03 GMT) (full text, mbox, link).


Acknowledgement sent to Apollon Oikonomopoulos <apoikos@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 03 Oct 2017 14:27:03 GMT) (full text, mbox, link).


Message #54 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Apollon Oikonomopoulos <apoikos@debian.org>
To: 877418@bugs.debian.org, lamby@debian.org
Subject: Re: dh-strip-nondeterminism: kills clojure performance
Date: Tue, 3 Oct 2017 17:23:03 +0300
Hi Chris,

On Tue, 03 Oct 2017 15:00:11 +0100 Chris Lamb <lamby@debian.org> wrote:
> tags 877418 + pending
> thanks
> 
> > Setting the mtime of .clj files one second earlier than .class
> > should Do The Right Thing™.
> 
> Thanks. I've just pushed the following:
> 
>   https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=99af63bec965d924275d53f4db90f9853e4db8a7
> 
>   https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=3f92d1b3d5cfc7b9b82cec176b3e602d0a34fbaf
> 
>   https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=dec86231ce51db87d28db35fbedb9c887db569fd
> 
>   https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=7691e2980274c1b041b8730bae5a8f5374cbcbf1

Thanks for fixing this. Just a small comment: the comment in [1] should probably say "to
always be older than .class", instead of "to always be younger".

Cheers,
Apollon

[1] https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=7691e2980274c1b041b8730bae5a8f5374cbcbf1



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 03 Oct 2017 14:39:06 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 03 Oct 2017 14:39:06 GMT) (full text, mbox, link).


Message #59 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Apollon Oikonomopoulos <apoikos@debian.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Tue, 03 Oct 2017 15:37:13 +0100
Hi Apollon,


> Thanks for fixing this. Just a small comment: the comment in [1]
> should probably say "to always be older than .class", instead of
"to always be younger".

Good idea; pushed in:

  https://anonscm.debian.org/git/reproducible/strip-nondeterminism.git/commit/?id=cb9261d05e6891153f3d44ad2cc6c0e3184dbc60


Best wishes,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 03 Oct 2017 17:18:04 GMT) (full text, mbox, link).


Acknowledgement sent to Emmanuel Bourg <ebourg@apache.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 03 Oct 2017 17:18:04 GMT) (full text, mbox, link).


Message #64 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Emmanuel Bourg <ebourg@apache.org>
To: Chris Lamb <lamby@debian.org>, Emmanuel Bourg <emmanuel.bourg@gmail.com>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Tue, 3 Oct 2017 19:08:17 +0200
Le 3/10/2017 à 10:32, Chris Lamb a écrit :

> Great stuff! So, we have two options as I see it:
> 
>   a) We patch clojure with ">="  (and send it upstream, etc. etc.)
> 
>   b) We make strip-nondetermism subtract 1 second from the .clj files'
>      target modification times so it matches with the existing ">".
> 
> My preference is for "a)", naturally...

I thought about b) too but this is definitely a clojure bug.

Emmanuel Bourg



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 03 Oct 2017 17:51:08 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 03 Oct 2017 17:51:08 GMT) (full text, mbox, link).


Message #69 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Emmanuel Bourg <ebourg@apache.org>, 877418@bugs.debian.org, Emmanuel Bourg <emmanuel.bourg@gmail.com>
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Tue, 03 Oct 2017 18:49:39 +0100
Hi Emmanuel,

> >   a) We patch clojure with ">="  (and send it upstream, etc. etc.)
> > 
> >   b) We make strip-nondetermism subtract 1 second from the .clj files'
> >      target modification times so it matches with the existing ">".
> 
> I thought about b) too but this is definitely a clojure bug.

So, did you see Apollon's remarks to this bug? You seem to
disagree on where the bug is.

Very happy to rollback the changes to strip-nondeterminism that
implement b) if we go with a) in the end; I haven't uploaded yet.

Can we come to some conclusion here? :)


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 03 Oct 2017 22:45:03 GMT) (full text, mbox, link).


Acknowledgement sent to Emmanuel Bourg <ebourg@apache.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 03 Oct 2017 22:45:03 GMT) (full text, mbox, link).


Message #74 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Emmanuel Bourg <ebourg@apache.org>
To: Chris Lamb <lamby@debian.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Wed, 4 Oct 2017 00:41:00 +0200
Le 3/10/2017 à 19:49, Chris Lamb a écrit :

> So, did you see Apollon's remarks to this bug?

I guess the messages were delayed, I didn't see them yesterday.


> Very happy to rollback the changes to strip-nondeterminism that
> implement b) if we go with a) in the end; I haven't uploaded yet.
> 
> Can we come to some conclusion here? :)

Fixing our clojure package would only solve the issue for Debian. If
strip-nondeterminism is also meant to be used outside Debian it's
probably worth keeping the tweak for .clj files until upstream addresses
the issue (at least for resources in jar files I think).

Emmanuel Bourg



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Wed, 04 Oct 2017 07:06:02 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Wed, 04 Oct 2017 07:06:03 GMT) (full text, mbox, link).


Message #79 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Emmanuel Bourg <ebourg@apache.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Wed, 04 Oct 2017 08:03:05 +0100
Hi Emmanuel,

> Fixing our clojure package would only solve the issue for Debian.

Sure, but why don't we patch Debian's version of clojure whilst we wait
for upstream to "catch up"? :-)


Best wishes,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Wed, 04 Oct 2017 08:54:03 GMT) (full text, mbox, link).


Acknowledgement sent to Vincent Bernat <bernat@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Wed, 04 Oct 2017 08:54:03 GMT) (full text, mbox, link).


Message #84 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Vincent Bernat <bernat@debian.org>
To: Chris Lamb <lamby@debian.org>
Cc: Emmanuel Bourg <ebourg@apache.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Wed, 04 Oct 2017 10:43:27 +0200
[Message part 1 (text/plain, inline)]
 ❦  4 octobre 2017 08:03 +0100, Chris Lamb <lamby@debian.org> :

>> Fixing our clojure package would only solve the issue for Debian.
>
> Sure, but why don't we patch Debian's version of clojure whilst we wait
> for upstream to "catch up"? :-)

FWIW, I agree with Apollon. >= is better than > as the resolution of the
timestamp can be coarced. I am also worried Clojure may not be the only
one using this. For example, Python may use the same thing for pyc (just
checked, it doesn't, it uses not(>=)). We don't ship pyc in packages,
but there may be other things like that.
-- 
Make sure comments and code agree.
            - The Elements of Programming Style (Kernighan & Plauger)
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Wed, 04 Oct 2017 12:06:06 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Wed, 04 Oct 2017 12:06:06 GMT) (full text, mbox, link).


Message #89 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Vincent Bernat <bernat@debian.org>
Cc: Emmanuel Bourg <ebourg@apache.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Wed, 04 Oct 2017 13:02:04 +0100
Hi Vincent,

> FWIW, I agree with Apollon. >= is better than > as the resolution of the
> timestamp can be coarced.

Mm, I agree. Also, as strip-nondeterminism should really "go away" in the
medium- to long- term, I'd rather avoid adding ad-hoc modifications
(especially ones so ugly) for each language environment that can suffer
this issue.

> I am also worried Clojure may not be the only one using this.

I've just posted to -devel on this topic so this gets more exposure:

  https://lists.debian.org/debian-devel/2017/10/msg00073.html
 

Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Wed, 04 Oct 2017 18:36:02 GMT) (full text, mbox, link).


Acknowledgement sent to Elana Hashman <debian@hashman.ca>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Wed, 04 Oct 2017 18:36:03 GMT) (full text, mbox, link).


Message #94 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Elana Hashman <debian@hashman.ca>
To: 877418@bugs.debian.org
Subject: Re: dh-strip-nondeterminism: kills clojure performance
Date: Wed, 4 Oct 2017 14:25:50 -0400
Hi all, just catching up on this thread.

FWIW I agree with Apollon; as rlb pointed out on IRC, we introduce a 
potential race condition when we don't recompile when the timestamps are 
equal. Quoting him...

    The scenario I was thinking of is "clock ticks over to 1001s, I 
    compile foo.clj -> foo.class, then I edit foo.clj, then clock ticks 
    over to 1002s, and we make a jar", but the filesystem says both the 
    .clj and the .class are mtime 1001s even though foo.clj is 
    different. This example assumes a filesystem with 1s mtime 
    resolution.

Unlikely for a human editing files, of course, but could be problematic 
with e.g. automated build processes.

As such I don't actually know if Clojure upstream would be willing to 
accept the patch. I can submit just to see what they say? I'm honestly 
not sure they'll consider this a bug.

Having chatted with Phil Hagelberg (author of leiningen) as well, he 
suggests we go for solution b) or something similar, as he believes this 
to be a packaging concern as opposed to a core language problem.

- e



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Wed, 04 Oct 2017 18:48:05 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Wed, 04 Oct 2017 18:48:05 GMT) (full text, mbox, link).


Message #99 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Elana Hashman <debian@hashman.ca>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Wed, 04 Oct 2017 19:45:49 +0100
Hi Elana,

> Hi all, just catching up on this thread.

No problem, great to see more people adding their thoughts! :)

>  […] assumes a filesystem with 1s mtime resolution.

Mmm, which is a completely fair assumption. See also:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=804339 !

> Having chatted with Phil Hagelberg (author of leiningen) as well, he 
> suggests we go for solution b) or something similar, as he believes this 
> to be a packaging concern as opposed to a core language problem.

In the abstract, I would agree; it is "just" a packaging problem that
we've caused ourselves.

However, do we really want to maintain a list of ".class" → ".clj"
mappings to hack around, essentially forever? :)

Further to this, in an ideal world, strip-nodeterminism should (and will
not!) exist. Indeed, I love to *remove* handlers/features from it as they
get merged upstream.

*Very* quick thoughts here: could some variant of a) be merged
upstream…? Perhaps upstream could move to a hash-based system instead
of using timestamps? eg. encoding the SHA1 of the file in the filename.


Best wishes,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Wed, 04 Oct 2017 19:12:04 GMT) (full text, mbox, link).


Acknowledgement sent to Phil Hagelberg <phil@hagelb.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Wed, 04 Oct 2017 19:12:04 GMT) (full text, mbox, link).


Message #104 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Phil Hagelberg <phil@hagelb.org>
To: 877418@bugs.debian.org
Subject: Re: dh-strip-nondeterminism: kills clojure performance
Date: Wed, 04 Oct 2017 12:09:57 -0700
[Message part 1 (text/plain, inline)]
Hi; I'm the upstream maintainer of Leiningen, a Clojure application
being packaged for Debian.

I would strongly vote for adjusting the timestamps of .clj files to be
older than the corresponding .class files.

I don't know enough about filesystem timestamp granularity to comment on
the wisdom of >= vs >, but I do know that patches to Clojure from
outsiders (myself included) often take years to get applied (if ever)
and the value of maintaining compatibility with older versions of
Clojure shouldn't be underestimated.

Users of Leiningen will pull in whatever version of Clojure is specified
by their application (usually not the same one as is packaged by
Debian), and if jars from the Debian repository end are packaged with
the assumption that they are consumed with a >=-patched Clojure, this
will cause a lot of subtle confusion.

-Phil
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Thu, 05 Oct 2017 00:18:02 GMT) (full text, mbox, link).


Acknowledgement sent to Rob Browning <rlb@defaultvalue.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Thu, 05 Oct 2017 00:18:02 GMT) (full text, mbox, link).


Message #109 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Rob Browning <rlb@defaultvalue.org>
To: Chris Lamb <lamby@debian.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Wed, 04 Oct 2017 19:15:19 -0500
> Chris Lamb <lamby@debian.org> writes:


>>  […] assumes a filesystem with 1s mtime resolution.

> Mmm, which is a completely fair assumption. See also:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=804339 !

While I did mention filesystem timestamps on IRC as an example, and they
are relevant for say make, do they matter here?

Or rather, if Clojure's only looking at the timestamps in the jar file,
then those may have a known (fixed) resolution, and so we'd just need to
make sure that the .clj files are at least that much older than the
corresponding .class files inside the jar.

Though I'd probably still pick 1s or more just so that an unpacked jar
will still have the right timestamp ordering on the vast majority of
filesystems.

Or perhaps we're not (re)building the jar(zip) manually, but building a
new one after round-tripping the files through the current filesystem?

...in which case perhaps an offset of a second or more is still
sufficient.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Thu, 05 Oct 2017 10:00:04 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Thu, 05 Oct 2017 10:00:04 GMT) (full text, mbox, link).


Message #114 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Rob Browning <rlb@defaultvalue.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Thu, 05 Oct 2017 10:56:46 +0100
Hi Rob,

> Or rather, if Clojure's only looking at the timestamps in the jar file,
> then those may have a known (fixed) resolution, and so we'd just need to
> make sure that the .clj files are at least that much older than the
> corresponding .class files inside the jar.

Right; that's:

> >  b) We make strip-nondetermism subtract 1 second from the .clj files'
> >     target modification times so it matches with the existing ">".

.. is it not? :)

> Though I'd probably still pick 1s or more just so that an unpacked jar
> will still have the right timestamp ordering on the vast majority of
> filesystems.

I don't quite get what you mean I'm afraid. Filesystem ordering (at least
via readdir/listdir, etc.) is non-deterministic. Can you explain it to me
another way? I'd also be curious to know why you think *more* than one
second could ever be needed here. I think I'm mising something.


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Thu, 05 Oct 2017 19:42:18 GMT) (full text, mbox, link).


Acknowledgement sent to Rob Browning <rlb@defaultvalue.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Thu, 05 Oct 2017 19:42:18 GMT) (full text, mbox, link).


Message #119 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Rob Browning <rlb@defaultvalue.org>
To: Chris Lamb <lamby@debian.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Thu, 05 Oct 2017 14:40:40 -0500
Chris Lamb <lamby@debian.org> writes:

> I don't quite get what you mean I'm afraid. Filesystem ordering (at least
> via readdir/listdir, etc.) is non-deterministic. Can you explain it to me
> another way?

(...or quite likely I'm not describing things all that well.)

In Clojure's case, I'd think that setting the .clj mtime to at least 1s
before the corresponding .class file in the jar should work fine, though
if Clojure's only consulting the jar, then any other offset that
registers as smaller should also work, i.e. it might not have to be a
full second inside jars.

But sticking with at least 1s should make things a bit more general
because then if you

  jar xf foo.jar

the resulting tree will still show the right relative offsets on common
filesystems (assuming "jar x" tries to preserve mtimes) so that any
tool, clojure, some clojure build tool, etc. will still work as expected
with the tree.


...then I started thinking more generally and wondered if (eventually)
we might be able to do something even more broadly helpful.

If we were to take any archive we're rewriting (tar, jar, cpio), and
sort all the files by decreasing mtime, then assign the set of files
with the largest mtime to have some mtime_0, assign the set of files
with the second largest mtime to have (mtime_0 - 1s), the third set to
(mtime_0 - 2s), etc., we'd preserve the overall ordering among the
files so that something like:

   tar xf some-reproducible-archive.tgz
   cd some-reproducible-archive
   make

would stand a good chance of just working as it would have with the
original archive.

> I'd also be curious to know why you think *more* than one second could
> ever be needed here. I think I'm mising something.

I suspect 1s is just fine, and I have nothing concrete in mind here --
it just made me think of the general floating point issues (if any end
up involved in the path), e.g. 4.000...1 vs 4 vs 3.999... vs
rounding/truncation to the final value, etc.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Thu, 05 Oct 2017 19:48:02 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Kahn Gillmor <dkg@fifthhorseman.net>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Thu, 05 Oct 2017 19:48:02 GMT) (full text, mbox, link).


Message #124 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Chris Lamb <lamby@debian.org>, 877418@bugs.debian.org, Elana Hashman <debian@hashman.ca>
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Thu, 05 Oct 2017 12:45:06 -0700
On Wed 2017-10-04 19:45:49 +0100, Chris Lamb wrote:
> *Very* quick thoughts here: could some variant of a) be merged
> upstream…? Perhaps upstream could move to a hash-based system instead
> of using timestamps? eg. encoding the SHA1 of the file in the filename.

I'm thinking about this problem more generally than clojure
specifically -- other folks have raised python's .py → .pyc mappings and
i'm sure there are other similar frameworks.  I want to make sure we're
thinking about the various places that these checks happen.

It may also matter whether we're talking about file stored in an archive
vs. one stored in the filesystem.  different archive formats and
different filesystems have different timestamp granularity (iirc, FAT
has 2s granularity, for example).

And there are more questions too: what if multiple source files
contributed to the creation of the compiled artifact (e.g. "include"
directives)?

You can also imagine a compilation regime that detects changes to a file
(e.g. via inotify) and immediately triggers recompilation -- with a fast
compiler and a coarse filesystem/archive timestamp, such a regime would
end up in the same situation (serious performance impact).

And of course, it's always possible to (accidentally or intentionally)
just "touch" the timestamps on a totally different bytecode file of the
appropriate name to trick or confuse this optimization step.

There are also problems with the digest based approach that lamby
suggests: it's significantly more expensive to do a full source
extraction and digest than it is to compare timestamp metadata.

--

So i think we have to ask what the goal of this check is from the upstream
platform's point of view:

 * is it strong assurance that the file was built from the
   exposed source?

 * is it a speedy (if fallible) sanity check?

i think that it can't really be the former (because of all the corner
cases outlined above), so the question is what kind of failure modes and
risks they're willing to tolerate.  Those that want absolute assurance
will be obliged to recompile each time unless they have some sort of
externally-audited mapping/manifest.

It sounds to me like python has made a sensible tradeoff (accepting that
equal timestamps means OK) and clojure has made a decision that tries to
get more of a guarantee than they can actually get, and sacrificed
performance for it.

            --dkg



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Thu, 05 Oct 2017 20:00:06 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Kahn Gillmor <dkg@fifthhorseman.net>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Thu, 05 Oct 2017 20:00:06 GMT) (full text, mbox, link).


Message #129 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Chris Lamb <lamby@debian.org>, 877418@bugs.debian.org, Rob Browning <rlb@defaultvalue.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Thu, 05 Oct 2017 12:57:45 -0700
On Thu 2017-10-05 10:56:46 +0100, Chris Lamb wrote:
> I'd also be curious to know why you think *more* than one second could
> ever be needed here. I think I'm mising something.

some filesystems have a resolution > 1s :(

  http://www.ntfs.com/exfat-comparison.htm

shows that FAT32 has a 2s granularity when used without extensions.
Looks like the Linux kernel remembers a 1sec granularity while still
mounted, but shows just the 2sec granularity across remounts:


   mkfs -t vfat $blkdev
   mount $blkdev /mnt
   for a in 1 2 3; do
      touch /mnt/$a
      sleep 1
   done
   stat /mnt/* | grep Modify
   umount /mnt
   mount $blkdev /mnt
   stat /mnt/* | grep Modify
   umount /mnt


produces two batches of mtime stats:

Modify: 2017-10-05 12:56:14.000000000 -0700
Modify: 2017-10-05 12:56:15.000000000 -0700
Modify: 2017-10-05 12:56:16.000000000 -0700

Modify: 2017-10-05 12:56:14.000000000 -0700
Modify: 2017-10-05 12:56:14.000000000 -0700
Modify: 2017-10-05 12:56:16.000000000 -0700



      --dkg



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Fri, 06 Oct 2017 09:03:12 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Fri, 06 Oct 2017 09:03:12 GMT) (full text, mbox, link).


Message #134 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>, 877418@bugs.debian.org, Elana Hashman <debian@hashman.ca>
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Fri, 06 Oct 2017 10:00:27 +0100
Hi dkg,

> And there are more questions too: what if multiple source files
> contributed to the creation of the compiled artifact (e.g. "include"
> directives)?

Hm, that's an excellent point.

> You can also imagine a compilation regime that detects changes to a file
> (e.g. via inotify) and immediately triggers recompilation -- with a fast
> compiler and a coarse filesystem/archive timestamp, such a regime would
> end up in the same situation (serious performance impact).

Sure, but that doesn't seem like it would happen as part of a package
build?

> There are also problems with the digest based approach that lamby
> suggests: it's significantly more expensive to do a full source
> extraction and digest than it is to compare timestamp metadata.

If it were hardcoded into the filenames, one wouldn't need to do
anything onerous, eg.

  -rw-r--r-- 1 0 Oct  6 09:56 helloworld.adc83b19e793491b1c6ea0fd8b46cd9f32e592fc.class
  -rw-r--r-- 1 0 Oct  6 09:56 helloworld.adc83b19e793491b1c6ea0fd8b46cd9f32e592fc.clj

(Not entirely serious)

> It sounds to me like python has made a sensible tradeoff (accepting that
> equal timestamps means OK)

Just to underline, Python in Debian would not be a problem even with <
unless you consider building a .deb with SOURCE_DATE_EPOCH="$(date +%s)"
and installing that very same .deb within same second...

 … but I understand you were being more general about this topic!


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Sat, 07 Oct 2017 00:48:02 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Kahn Gillmor <dkg@fifthhorseman.net>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Sat, 07 Oct 2017 00:48:02 GMT) (full text, mbox, link).


Message #139 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Chris Lamb <lamby@debian.org>, 877418@bugs.debian.org, Elana Hashman <debian@hashman.ca>
Subject: Re: Bug#877418: dh-strip-nondeterminism: kills clojure performance
Date: Fri, 06 Oct 2017 14:45:27 -0700
On Fri 2017-10-06 10:00:27 +0100, Chris Lamb wrote:
> If it were hardcoded into the filenames, one wouldn't need to do
> anything onerous, eg.
>
>   -rw-r--r-- 1 0 Oct  6 09:56 helloworld.adc83b19e793491b1c6ea0fd8b46cd9f32e592fc.class
>   -rw-r--r-- 1 0 Oct  6 09:56 helloworld.adc83b19e793491b1c6ea0fd8b46cd9f32e592fc.clj
>
> (Not entirely serious)

ah!  i hadn't even thought of that :)  I wonder whether any language
would consider such a construct.

> Just to underline, Python in Debian would not be a problem even with <
> unless you consider building a .deb with SOURCE_DATE_EPOCH="$(date +%s)"
> and installing that very same .deb within same second...
>
>  … but I understand you were being more general about this topic!

yep, exactly -- i'm not saying that python is broken in debian, just
citing it as an example of another language that does the same kind of
thing, similarly to elisp, etc.

       --dkg



Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 10 Oct 2017 19:21:08 GMT) (full text, mbox, link).


Acknowledgement sent to Paul Gevers <elbrus@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 10 Oct 2017 19:21:08 GMT) (full text, mbox, link).


Message #144 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Paul Gevers <elbrus@debian.org>
To: 877418@bugs.debian.org
Subject: Re: Is strip-nondeterminism causing performance regressions in your packages?
Date: Tue, 10 Oct 2017 21:05:48 +0200
[Message part 1 (text/plain, inline)]
[Resending to the bug. It seems I am not allowed to post this message to
debian-devel@l.d.o. I did send it first on on 2017-10-04 19:51 +0200]

Hi Chris,

On 04-10-17 13:43, Chris Lamb wrote:
>  a) Ship both compiled & source in the .deb
> 
> and
> 
>  b) Do a *runtime* timestamp comparison between the two.

Yes, FreePascal does such a thing. And it is causing me headaches
already. I introduced a helper script to fix some of the issues called
fp-fix-timestamps¹. It isn't great, but it does the job so far for
multiple packages. If strip-nondeterminism could support ppu files as
well, that would be awesome (I never realized that it could).

Paul

¹ https://manpages.debian.org/stretch/fp-utils/fp-fix-timestamps.1.en.html





[signature.asc (application/pgp-signature, attachment)]

Reply sent to Chris Lamb <lamby@debian.org>:
You have taken responsibility. (Fri, 20 Oct 2017 13:54:05 GMT) (full text, mbox, link).


Notification sent to Rob Browning <rlb@defaultvalue.org>:
Bug acknowledged by developer. (Fri, 20 Oct 2017 13:54:05 GMT) (full text, mbox, link).


Message #149 received at 877418-close@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: 877418-close@bugs.debian.org
Subject: Bug#877418: fixed in strip-nondeterminism 0.039-1
Date: Fri, 20 Oct 2017 13:50:48 +0000
Source: strip-nondeterminism
Source-Version: 0.039-1

We believe that the bug you reported is fixed in the latest version of
strip-nondeterminism, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 877418@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Chris Lamb <lamby@debian.org> (supplier of updated strip-nondeterminism package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Fri, 20 Oct 2017 09:11:36 -0400
Source: strip-nondeterminism
Binary: libfile-stripnondeterminism-perl strip-nondeterminism dh-strip-nondeterminism
Architecture: source
Version: 0.039-1
Distribution: unstable
Urgency: medium
Maintainer: Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>
Changed-By: Chris Lamb <lamby@debian.org>
Description:
 dh-strip-nondeterminism - file non-deterministic information stripper — Debhelper add-on
 libfile-stripnondeterminism-perl - file non-deterministic information stripper — Perl module
 strip-nondeterminism - file non-deterministic information stripper — stand-alone tool
Closes: 876140 877418
Changes:
 strip-nondeterminism (0.039-1) unstable; urgency=medium
 .
   [ Chris Lamb ]
   * Clojure considers the .class file to be stale if it shares the same
     timestamp of the .clj. We thus adjust the timestamps of the .clj to always
     be younger. (Closes: #877418)
     - {jar,zip}.pm: Allow $options{member_normalizer} callback to support
       specifying the timestamp.
     - zip.pm: Calculate the target canonical time in just one place.
     * zip.pm: Ensure that we don't try and write an old timestamp;
       Archive::Zip will do this anyway, just noisily.
   * dh_strip_nondeterminism: Log which handler processed a file.
     (Closes: #876140)
   * bin/strip-nondeterminism: Print a warning in --verbose mode if no
     canonical time specified.
   * debian/watch: Use HTTPS URI.
 .
   [ Holger Levsen ]
   * Bump Standards-Version to 4.1.1, no changes needed.
Checksums-Sha1:
 0a169f23020605b9f59b88bb76679264a5003f7c 2529 strip-nondeterminism_0.039-1.dsc
 039f60a1ca93aa2c4287105e081fc7e32b82a603 184630 strip-nondeterminism_0.039.orig.tar.bz2
 b43fcef35d0aab6bae46514ea8f44f097cba415a 12168 strip-nondeterminism_0.039-1.debian.tar.xz
 9008b9cc87ab5c2d49aa6429d8aa93af3f2130c9 6335 strip-nondeterminism_0.039-1_amd64.buildinfo
Checksums-Sha256:
 13cd98332d1866b470d5a11159e9954686aebf4a5b26a65debe5e43e8d1b56e4 2529 strip-nondeterminism_0.039-1.dsc
 90efde0dfcfa3dd64eb0b01bd4c0cbaea12b27330388e0ecb7b179089d049409 184630 strip-nondeterminism_0.039.orig.tar.bz2
 7e8034336d21e80faed20c8b74455028ac1567385b005e31768e7821aa4ab008 12168 strip-nondeterminism_0.039-1.debian.tar.xz
 e7526c421ec42132b612361a0bec6c38d4e925437bc9cfa4d9174b0cd091dad0 6335 strip-nondeterminism_0.039-1_amd64.buildinfo
Files:
 b9bd9a6bd8d3418be6835bc8365a4be1 2529 devel optional strip-nondeterminism_0.039-1.dsc
 7d63ccd120a643d58562437bb78c6e57 184630 devel optional strip-nondeterminism_0.039.orig.tar.bz2
 d2d6c3a3598bccd2cac8cf68f8763f2d 12168 devel optional strip-nondeterminism_0.039-1.debian.tar.xz
 59ae44dc3a7c403280d2103a5860041e 6335 devel optional strip-nondeterminism_0.039-1_amd64.buildinfo

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEwv5L0nHBObhsUz5GHpU+J9QxHlgFAlnp+0AACgkQHpU+J9Qx
HlgsnBAAlxcQPyVV2JaC7wXlrMC3pY+7syaUVtG2OEcV79ew0NQ6qzq3Hh2/mgh5
GQ+68r2uSLf2Ee593B74YPgkhcae3uNHaWAUYsX3mlzvnlhKVEKPMwJk1YVRrLmN
6qKY5cPOcjv01b6f9Mycd/Nr3JfiCHn1jFlDYFudqNruIaV+N6IsHsI3dMPXg+3x
d5wKeeR1d/fHmxULCW116CxM+H687dSmrS55RBjOUm5zMFGPTj6xR5MysM5xUifF
nsmlm2ULtAfoIMz9fL2Op6FtPwOctfzzUo5hKTOnL/gPyjx2UlJ9WOYObLW/d9d5
7iNdmHcL27pZbFbpcyuu7VBJAyNYdg3v/89VgZJOSI0KTVQ/9EBxjAit5bZjKtoe
BjX9v+uu0zt7cMS4xhTT5fwXQDTI9/sxmwuv1i304kYWc72EU6SqCgdKjp1XQ16C
y7/j34c5m3tLPPdsfufwUDZSWFpLIlUsbHHrqd+YGiA3xw5lfRDZrYp5LfWg2my0
23n3KadtSJW2kl0/S//qcPt6o9As/Ec87jhkbaHdpcezI90fa6v0Ce7NQ8ohOAox
6lDXjOm93+fy0FdnXoLDkqaT/1GU3O8H4vNdGYskm2EftKwA+b64x03BztUjoRnD
Z1HcTaNx/4A3tVFc52tamCA/qgyW2opg8gKtuSskTo1mVUWBdHs=
=VHej
-----END PGP SIGNATURE-----




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Wed, 22 Nov 2017 07:25:36 GMT) (full text, mbox, link).


Bug unarchived. Request was from Rob Browning <rlb@defaultvalue.org> to control@bugs.debian.org. (Mon, 25 Jun 2018 05:36:03 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Mon, 25 Jun 2018 05:42:04 GMT) (full text, mbox, link).


Acknowledgement sent to Rob Browning <rlb@defaultvalue.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Mon, 25 Jun 2018 05:42:04 GMT) (full text, mbox, link).


Message #158 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Rob Browning <rlb@defaultvalue.org>
To: 877418@bugs.debian.org, control@bugs.debian.org
Subject: Re: Bug#877418 closed by Chris Lamb <lamby@debian.org> (Bug#877418: fixed in strip-nondeterminism 0.039-1)
Date: Mon, 25 Jun 2018 00:38:05 -0500
reopen 877418
thanks

Debian Bug Tracking System <owner@bugs.debian.org> writes:

>    * Clojure considers the .class file to be stale if it shares the same
>      timestamp of the .clj. We thus adjust the timestamps of the .clj to always
>      be younger. (Closes: #877418)
>      - {jar,zip}.pm: Allow $options{member_normalizer} callback to support
>        specifying the timestamp.
>      - zip.pm: Calculate the target canonical time in just one place.
>      * zip.pm: Ensure that we don't try and write an old timestamp;
>        Archive::Zip will do this anyway, just noisily.

Hmm, it looks like the fix may not have been sufficient, or there's been
a regression.  With the current clojure 1.8.0-5 and
dh-strip-nondeterminism 0.042-1, building clojure1.8 via "fakeroot
debian/rules binary" normally reveals:

  $ time java -cp debian/libclojure1.8-java/usr/share/java/clojure.jar clojure.main -e ''

  real    0m3.832s
  user    0m11.411s
  sys     0m0.206s

and with dh-strip-nodeterminism symlinked to /bin/true in the path:

  $ time java -cp debian/libclojure1.8-java/usr/share/java/clojure.jar clojure.main -e ''

  real    0m0.846s
  user    0m1.600s
  sys     0m0.076s

Looking inside the jar, core.clj and derived class files, for example,
do appear to have the same timestamps:

  $ jar tvf debian/libclojure1.8-java/usr/share/java/clojure.jar \
      | grep -E 'clojure/core(\.clj|\$interleave)'

     580 Mon Mar 19 16:59:46 CDT 2018 clojure/core$interleave$fn__5149.class
    1138 Mon Mar 19 16:59:46 CDT 2018 clojure/core$interleave$fn__5151.class
    1740 Mon Mar 19 16:59:46 CDT 2018 clojure/core$interleave$fn__5154.class
    1826 Mon Mar 19 16:59:46 CDT 2018 clojure/core$interleave.class
  251643 Mon Mar 19 16:59:46 CDT 2018 clojure/core.clj

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4



Bug reopened Request was from Rob Browning <rlb@defaultvalue.org> to control@bugs.debian.org. (Mon, 25 Jun 2018 05:42:06 GMT) (full text, mbox, link).


No longer marked as fixed in versions strip-nondeterminism/0.039-1. Request was from Rob Browning <rlb@defaultvalue.org> to control@bugs.debian.org. (Mon, 25 Jun 2018 05:42:06 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>:
Bug#877418; Package dh-strip-nondeterminism. (Tue, 26 Jun 2018 21:45:07 GMT) (full text, mbox, link).


Acknowledgement sent to Chris Lamb <lamby@debian.org>:
Extra info received and forwarded to list. Copy sent to Reproducible builds folks <reproducible-builds@lists.alioth.debian.org>. (Tue, 26 Jun 2018 21:45:07 GMT) (full text, mbox, link).


Message #167 received at 877418@bugs.debian.org (full text, mbox, reply):

From: Chris Lamb <lamby@debian.org>
To: Rob Browning <rlb@defaultvalue.org>, 877418@bugs.debian.org
Subject: Re: Bug#877418: closed by Chris Lamb <lamby@debian.org> (Bug#877418: fixed in strip-nondeterminism 0.039-1)
Date: Tue, 26 Jun 2018 22:41:57 +0100
tags 877418 + moreinfo
thanks

Hi Rob,

> Hmm, it looks like the fix may not have been sufficient, or there's been
> a regression.
[…]
> Looking inside the jar, core.clj and derived class files, for example,
> do appear to have the same timestamps:
> 
>      580 Mon Mar 19 16:59:46 CDT 2018 clojure/core$interleave$fn__5149.class
>     1138 Mon Mar 19 16:59:46 CDT 2018 clojure/core$interleave$fn__5151.class
>     1740 Mon Mar 19 16:59:46 CDT 2018 clojure/core$interleave$fn__5154.class
>     1826 Mon Mar 19 16:59:46 CDT 2018 clojure/core$interleave.class
>   251643 Mon Mar 19 16:59:46 CDT 2018 clojure/core.clj

I can't reproduce this with strip-nondeterminism 0.042-1 and
src:clojure 1.8.0-6:

   580 Mon Jun 25 00:02:08 BST 2018 clojure/core$interleave$fn__5149.class
  1138 Mon Jun 25 00:02:08 BST 2018 clojure/core$interleave$fn__5151.class
  1740 Mon Jun 25 00:02:08 BST 2018 clojure/core$interleave$fn__5154.class
  1826 Mon Jun 25 00:02:08 BST 2018 clojure/core$interleave.class
251643 Mon Jun 25 00:02:06 BST 2018 clojure/core.clj
                        ^^

I also can't reproduce this with src:clojure1.9 1.9.0-4:

   580 Mon Jun 25 00:56:18 BST 2018 clojure/core$interleave$fn__8515.class
  1140 Mon Jun 25 00:56:18 BST 2018 clojure/core$interleave$fn__8517.class
  1742 Mon Jun 25 00:56:18 BST 2018 clojure/core$interleave$fn__8520.class
  1826 Mon Jun 25 00:56:18 BST 2018 clojure/core$interleave.class
259697 Mon Jun 25 00:56:16 BST 2018 clojure/core.clj
                        ^^

Any ideas? :)


Regards,

-- 
      ,''`.
     : :'  :     Chris Lamb
     `. `'`      lamby@debian.org / chris-lamb.co.uk
       `-



Added tag(s) moreinfo. Request was from Chris Lamb <lamby@debian.org> to control@bugs.debian.org. (Tue, 26 Jun 2018 21:45:13 GMT) (full text, mbox, link).


Reply sent to Rob Browning <rlb@defaultvalue.org>:
You have taken responsibility. (Wed, 27 Jun 2018 01:15:03 GMT) (full text, mbox, link).


Notification sent to Rob Browning <rlb@defaultvalue.org>:
Bug acknowledged by developer. (Wed, 27 Jun 2018 01:15:03 GMT) (full text, mbox, link).


Message #174 received at 877418-done@bugs.debian.org (full text, mbox, reply):

From: Rob Browning <rlb@defaultvalue.org>
To: Chris Lamb <lamby@debian.org>, 877418-done@bugs.debian.org
Subject: Re: Bug#877418: closed by Chris Lamb <lamby@debian.org> (Bug#877418: fixed in strip-nondeterminism 0.039-1)
Date: Tue, 26 Jun 2018 20:03:43 -0500
Chris Lamb <lamby@debian.org> writes:

> I can't reproduce this with strip-nondeterminism 0.042-1 and
> src:clojure 1.8.0-6:

Oddly, I can't either.  It looks like maybe I hadn't actually pulled in
the latest clojure1.8 when I upgraded, but that doesn't explain why my
local build was also slow and I think had the homogeneous timestamps...

But I can verify that clojure1.8 1.8.0-6 seems fine now:

  $ dpkg --status clojure1.8 | grep Version
  Version: 1.8.0-6

  $ dlocate /usr/bin/clojure1.8
  clojure1.8: /usr/bin/clojure1.8

  $ time clojure1.8 -e ''

  real    0m0.852s
  user    0m1.640s
  sys     0m0.063s

  $ jar -tvf /usr/share/maven-repo/org/clojure/clojure/1.8.0/clojure-1.8.0.jar | grep -E 'clojure/core(\.clj|\$interleave)'
     580 Sun Jun 24 23:02:08 CDT 2018 clojure/core$interleave$fn__5149.class
    1138 Sun Jun 24 23:02:08 CDT 2018 clojure/core$interleave$fn__5151.class
    1740 Sun Jun 24 23:02:08 CDT 2018 clojure/core$interleave$fn__5154.class
    1826 Sun Jun 24 23:02:08 CDT 2018 clojure/core$interleave.class
  251643 Sun Jun 24 23:02:06 CDT 2018 clojure/core.clj

And I see the expected timestamps for the current sid clojure1.9 (via
clojure) too:

  $ dpkg --status clojure | grep Version
  Version: 1.9.0-4

  $ jar -tvf /usr/share/maven-repo/org/clojure/clojure/1.9.0/clojure-1.9.0.jar | grep -E 'clojure/core(\.clj|\$interleave)'
     580 Sun Jun 24 23:56:18 CDT 2018 clojure/core$interleave$fn__8515.class
    1140 Sun Jun 24 23:56:18 CDT 2018 clojure/core$interleave$fn__8517.class
    1742 Sun Jun 24 23:56:18 CDT 2018 clojure/core$interleave$fn__8520.class
    1826 Sun Jun 24 23:56:18 CDT 2018 clojure/core$interleave.class
  259697 Sun Jun 24 23:56:16 CDT 2018 clojure/core.clj

But I do see a ~3x slowdown for 1.9 as compared to a locally built
uberjar:

  $ clj-local
  Clojure 1.9.0
  user=> 

  $ time clj-local -e ''

  real    0m0.675s
  user    0m1.108s
  sys     0m0.075s

  $ time java -cp /usr/share/maven-repo/org/clojure/clojure/1.9.0/clojure-1.9.0.jar clojure.main -e ''

  real    0m2.252s
  user    0m5.976s
  sys     0m0.111s

In any case, at the moment I don't have any reason to think that's
timestamp-related, as compared to anything else, so I'll (re)close
this for now.

Thanks
-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4



Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Wed, 25 Jul 2018 07:25:33 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed May 17 13:54:02 2023; Machine Name: bembo

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.