Debian Bug report logs -
#814183
openmpi 1.10.2 is broken on powerpc
Reported by: Matthias Klose <doko@debian.org>
Date: Mon, 8 Feb 2016 21:06:02 UTC
Severity: serious
Tags: sid, stretch
Found in version openmpi/1.10.2-5
Done: Alastair McKinstry <mckinstry@debian.org>
Bug is archived. No further changes may be made.
Toggle useless messages
Report forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Mon, 08 Feb 2016 21:06:05 GMT) (full text, mbox, link).
Acknowledgement sent
to Matthias Klose <doko@debian.org>:
New Bug report received and forwarded. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Mon, 08 Feb 2016 21:06:06 GMT) (full text, mbox, link).
Message #5 received at submit@bugs.debian.org (full text, mbox, reply):
Package: src:openmpi
Version: 1.10.2-5
Severity: serious
Tags: sid stretch
openmpi 1.10.2 is broken on powerpc.
Graham Inggs confirmed that at least aces3 and petsc fail in the same way in
Debian unstable, as soon the mpi test program is launched.
[...]
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1 MPI process
See http://www.mcs.anl.gov/petsc/documentation/faq.html
--------------------------------------------------------------------------
A deprecated MCA variable value was specified in the environment or
on the command line. Deprecated MCA variables should be avoided;
they may disappear in future releases.
Deprecated variable: orte_rsh_agent
New variable: plm_rsh_agent
--------------------------------------------------------------------------
lid velocity = 0.0016, prandtl # = 1, grashof # = 1
Number of SNES iterations = 2
Session terminated, terminating shell... ...terminated.
make: *** [build-arch] Terminated
build logs:
https://launchpad.net/ubuntu/+source/aces3/3.0.8-5build2/+build/8974836
https://launchpad.net/ubuntu/+source/petsc/3.6.2.dfsg1-3build2/+build/8975053
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Tue, 09 Feb 2016 06:24:08 GMT) (full text, mbox, link).
Acknowledgement sent
to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Tue, 09 Feb 2016 06:24:08 GMT) (full text, mbox, link).
Message #10 received at 814183@bugs.debian.org (full text, mbox, reply):
I don't believe the warning below is related to the problem.
> A deprecated MCA variable value was specified in the environment or
> on the command line. Deprecated MCA variables should be avoided;
> they may disappear in future releases.
It can be avoided by changing the following line in petsc's debian/rules
export OMPI_MCA_orte_rsh_agent=/bin/false
to
export OMPI_MCA_plm_rsh_agent=/bin/false
Unfortunately this does not prevent the building ending with (as does aces3):
Build killed with signal TERM after 150 minutes of inactivity
On powerpc, running one of petsc's tests on one processor gets a
result (instantly):
$ mpiexec -n 1 ./ex19 -da_refine 3 -snes_monitor_short -pc_type mg
-ksp_type fgmres -pc_mg_type full
lid velocity = 0.0016, prandtl # = 1, grashof # = 1
0 SNES Function norm 0.0406612
1 SNES Function norm 3.33636e-06
2 SNES Function norm 1.653e-11
Number of SNES iterations = 2
Running it on two processors never completes:
$ mpiexec -n 2 ./ex19 -da_refine 3 -snes_monitor_short -pc_type mg
-ksp_type fgmres -pc_mg_type full
lid velocity = 0.0016, prandtl # = 1, grashof # = 1
0 SNES Function norm 0.0406612
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Tue, 09 Feb 2016 19:51:07 GMT) (full text, mbox, link).
Acknowledgement sent
to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Tue, 09 Feb 2016 19:51:07 GMT) (full text, mbox, link).
Message #15 received at 814183@bugs.debian.org (full text, mbox, reply):
Petsc rebuilt successfully [1] a couple of hours ago on poulenc.d.o. [2].
My previous tests were done on partch.d.o. [3]. Partch has 2GB of RAM
vs Poulenc's 5GB, I don't know if this is significant.
[1] https://buildd.debian.org/status/fetch.php?pkg=petsc&arch=powerpc&ver=3.6.2.dfsg1-3%2Bb3&stamp=1455016089
[2] https://db.debian.org/machines.cgi?host=poulenc
[3] https://db.debian.org/machines.cgi?host=partch
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Thu, 11 Feb 2016 23:21:08 GMT) (full text, mbox, link).
Acknowledgement sent
to Emilio Pozuelo Monfort <pochu@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Thu, 11 Feb 2016 23:21:08 GMT) (full text, mbox, link).
Message #20 received at 814183@bugs.debian.org (full text, mbox, reply):
On Tue, 9 Feb 2016 21:49:29 +0200 Graham Inggs <ginggs@debian.org> wrote:
> Petsc rebuilt successfully [1] a couple of hours ago on poulenc.d.o. [2].
> My previous tests were done on partch.d.o. [3]. Partch has 2GB of RAM
> vs Poulenc's 5GB, I don't know if this is significant.
aces3 failed on powerpc-osuosl-01.
poulenc is a PPC970FX
patch is a POWER7
powerpc-osuosl-01 is a POWER8
Dunno if that is relevant.
Cheers,
Emilio
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Fri, 12 Feb 2016 07:39:08 GMT) (full text, mbox, link).
Acknowledgement sent
to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Fri, 12 Feb 2016 07:39:08 GMT) (full text, mbox, link).
Message #25 received at 814183@bugs.debian.org (full text, mbox, reply):
On 12 February 2016 at 01:17, Emilio Pozuelo Monfort <pochu@debian.org> wrote:
> On Tue, 9 Feb 2016 21:49:29 +0200 Graham Inggs <ginggs@debian.org> wrote:
>> Petsc rebuilt successfully [1] a couple of hours ago on poulenc.d.o. [2].
>> My previous tests were done on partch.d.o. [3]. Partch has 2GB of RAM
>> vs Poulenc's 5GB, I don't know if this is significant.
>
> aces3 failed on powerpc-osuosl-01.
>
> poulenc is a PPC970FX
> patch is a POWER7
> powerpc-osuosl-01 is a POWER8
>
> Dunno if that is relevant.
It might be, thanks! Is there any way to arrange for aces3 to be
rebuilt on poulenc? That should tell us something.
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Sat, 20 Feb 2016 14:45:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Emilio Pozuelo Monfort <pochu@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Sat, 20 Feb 2016 14:45:03 GMT) (full text, mbox, link).
Message #30 received at 814183@bugs.debian.org (full text, mbox, reply):
On Fri, 12 Feb 2016 09:25:56 +0200 Graham Inggs <ginggs@debian.org> wrote:
> On 12 February 2016 at 01:17, Emilio Pozuelo Monfort <pochu@debian.org> wrote:
> > On Tue, 9 Feb 2016 21:49:29 +0200 Graham Inggs <ginggs@debian.org> wrote:
> >> Petsc rebuilt successfully [1] a couple of hours ago on poulenc.d.o. [2].
> >> My previous tests were done on partch.d.o. [3]. Partch has 2GB of RAM
> >> vs Poulenc's 5GB, I don't know if this is significant.
> >
> > aces3 failed on powerpc-osuosl-01.
> >
> > poulenc is a PPC970FX
> > patch is a POWER7
> > powerpc-osuosl-01 is a POWER8
> >
> > Dunno if that is relevant.
>
> It might be, thanks! Is there any way to arrange for aces3 to be
> rebuilt on poulenc? That should tell us something.
It built on poulenc and failed on powerpc-osuosl-01:
https://buildd.debian.org/status/logs.php?pkg=aces3&ver=3.0.8-5%2Bb1&arch=powerpc
Emilio
Merged 813722 814183
Request was from Graham Inggs <ginggs@debian.org>
to control@bugs.debian.org.
(Sun, 21 Feb 2016 08:15:08 GMT) (full text, mbox, link).
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Mon, 29 Feb 2016 17:51:10 GMT) (full text, mbox, link).
Acknowledgement sent
to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Mon, 29 Feb 2016 17:51:10 GMT) (full text, mbox, link).
Message #37 received at 814183@bugs.debian.org (full text, mbox, reply):
I filed LP: #1550863 [1] to track the powerpc build failures in Ubuntu.
[1] https://bugs.launchpad.net/bugs/1550863
Disconnected #813722 from all other report(s).
Request was from Graham Inggs <ginggs@debian.org>
to control@bugs.debian.org.
(Fri, 04 Mar 2016 18:45:36 GMT) (full text, mbox, link).
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Fri, 04 Mar 2016 19:12:20 GMT) (full text, mbox, link).
Acknowledgement sent
to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Fri, 04 Mar 2016 19:12:20 GMT) (full text, mbox, link).
Message #44 received at 814183@bugs.debian.org (full text, mbox, reply):
On 3 March 2016 at 13:47, Emilio Pozuelo Monfort <pochu@debian.org> wrote:
> Might be related to #813722 / #814183.
Definitely.
ELPA built on poulenc and praetorius, but failed on powerpc-osuosl-01:
https://buildd.debian.org/status/logs.php?pkg=elpa&arch=powerpc
Only looking at elpa >= 2015.05.001-1 since openmpi 1.10, and ignoring
failures quicker than 2.5 hours due to bugs in packaging.
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Mon, 25 Apr 2016 19:06:06 GMT) (full text, mbox, link).
Acknowledgement sent
to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Mon, 25 Apr 2016 19:06:06 GMT) (full text, mbox, link).
Message #49 received at 814183@bugs.debian.org (full text, mbox, reply):
Please see #816101 [1]. It seems the powerpc and mipsel issues are
closely related.
The PETSc package maintainer conditionally disabled the 2 process MPI
tests on powerpc and mipsel in order to work around the problem.
[1] https://bugs.debian.org/816101
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Sun, 04 Sep 2016 14:45:03 GMT) (full text, mbox, link).
Acknowledgement sent
to Emilio Pozuelo Monfort <pochu@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Sun, 04 Sep 2016 14:45:03 GMT) (full text, mbox, link).
Message #54 received at 814183@bugs.debian.org (full text, mbox, reply):
On Fri, 12 Feb 2016 00:17:28 +0100 Emilio Pozuelo Monfort <pochu@debian.org> wrote:
> On Tue, 9 Feb 2016 21:49:29 +0200 Graham Inggs <ginggs@debian.org> wrote:
> > Petsc rebuilt successfully [1] a couple of hours ago on poulenc.d.o. [2].
> > My previous tests were done on partch.d.o. [3]. Partch has 2GB of RAM
> > vs Poulenc's 5GB, I don't know if this is significant.
>
> aces3 failed on powerpc-osuosl-01.
>
> poulenc is a PPC970FX
> patch is a POWER7
> powerpc-osuosl-01 is a POWER8
Any progress on this? Has this been forwarded upstream?
Emilio
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Sun, 04 Sep 2016 14:51:04 GMT) (full text, mbox, link).
Acknowledgement sent
to Alastair McKinstry <alastair.mckinstry@sceal.ie>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Sun, 04 Sep 2016 14:51:04 GMT) (full text, mbox, link).
Message #59 received at 814183@bugs.debian.org (full text, mbox, reply):
On 04/09/2016 15:43, Emilio Pozuelo Monfort wrote:
> On Fri, 12 Feb 2016 00:17:28 +0100 Emilio Pozuelo Monfort <pochu@debian.org> wrote:
>> On Tue, 9 Feb 2016 21:49:29 +0200 Graham Inggs <ginggs@debian.org> wrote:
>>> Petsc rebuilt successfully [1] a couple of hours ago on poulenc.d.o. [2].
>>> My previous tests were done on partch.d.o. [3]. Partch has 2GB of RAM
>>> vs Poulenc's 5GB, I don't know if this is significant.
>> aces3 failed on powerpc-osuosl-01.
>>
>> poulenc is a PPC970FX
>> patch is a POWER7
>> powerpc-osuosl-01 is a POWER8
> Any progress on this? Has this been forwarded upstream?
Yes, reported upstream.
I'm testing out a new version 2.0.1 that may have a fix.
>
> Emilio
Alastair
--
Alastair McKinstry, <alastair@sceal.ie>, <mckinstry@debian.org>, https://diaspora.sceal.ie/u/amckinstry
Misentropy: doubting that the Universe is becoming more disordered.
Reply sent
to Alastair McKinstry <mckinstry@debian.org>:
You have taken responsibility.
(Tue, 13 Sep 2016 15:15:07 GMT) (full text, mbox, link).
Notification sent
to Matthias Klose <doko@debian.org>:
Bug acknowledged by developer.
(Tue, 13 Sep 2016 15:15:07 GMT) (full text, mbox, link).
Message #64 received at 814183-done@bugs.debian.org (full text, mbox, reply):
[Message part 1 (text/plain, inline)]
close 812733
close 814183
thanks
I'm closing these bugs as fixed / unreproducible in 2.0.1-5. In
particular I've rebuilt both aces3 and petsc on powerpc and mipsel (on
partch.debian.org and eller.debian.org) and they build successfully.
There have been code changes and bug fixes in the wait /lock code, as
well as now using standard gcc atomics on both architectures, which
means the relevant code paths have changed.
Please reopen if the bug is seen again, but it is believed fixed.
Regards
Alastair
--
Alastair McKinstry, <alastair@sceal.ie>, <mckinstry@debian.org>, https://diaspora.sceal.ie/u/amckinstry
Misentropy: doubting that the Universe is becoming more disordered.
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to debian-bugs-dist@lists.debian.org, Alastair McKinstry <mckinstry@debian.org>:
Bug#814183; Package src:openmpi.
(Wed, 14 Sep 2016 16:54:07 GMT) (full text, mbox, link).
Acknowledgement sent
to Graham Inggs <ginggs@debian.org>:
Extra info received and forwarded to list. Copy sent to Alastair McKinstry <mckinstry@debian.org>.
(Wed, 14 Sep 2016 16:54:07 GMT) (full text, mbox, link).
Message #69 received at 814183@bugs.debian.org (full text, mbox, reply):
Hi Alastair
On 13 September 2016 at 17:15, Debian Bug Tracking System
<owner@bugs.debian.org> wrote:
> I'm closing these bugs as fixed / unreproducible in 2.0.1-5. In
> particular I've rebuilt both aces3 and petsc on powerpc and mipsel (on
> partch.debian.org and eller.debian.org) and they build successfully.
Note that many of the packages are already carrying patches to skip
tests on powerpc, or limit the number of MPI processes on powerpc
(np=1).
> There have been code changes and bug fixes in the wait /lock code, as
> well as now using standard gcc atomics on both architectures, which
> means the relevant code paths have changed.
That's good to hear.
> Please reopen if the bug is seen again, but it is believed fixed.
That's fine with me.
Regards
Graham
Bug archived.
Request was from Debbugs Internal Request <owner@bugs.debian.org>
to internal_control@bugs.debian.org.
(Thu, 13 Oct 2016 07:31:31 GMT) (full text, mbox, link).
Send a report that this bug log contains spam.
Debian bug tracking system administrator <owner@bugs.debian.org>.
Last modified:
Sat Jan 6 11:03:44 2018;
Machine Name:
buxtehude
Debian Bug tracking system
Debbugs is free software and licensed under the terms of the GNU
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.
Copyright © 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson,
2005-2017 Don Armstrong, and many other contributors.