Debian Bug report logs - #875990
reproducible: i/o issues with profitbricks-build2-i386 since stretch upgrade

Package: jenkins.debian.org; Maintainer for jenkins.debian.org is Debian Jenkins Team <qa-jenkins-dev@lists.alioth.debian.org>;

Reported by: Vagrant Cascadian <vagrant@debian.org>

Date: Sun, 17 Sep 2017 02:51:02 UTC

Severity: normal

Done: Holger Levsen <holger@layer-acht.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian Jenkins Team <qa-jenkins-dev@lists.alioth.debian.org>:
Bug#875990; Package jenkins.debian.org. (Sun, 17 Sep 2017 02:51:04 GMT) (full text, mbox, link).


Acknowledgement sent to Vagrant Cascadian <vagrant@debian.org>:
New Bug report received and forwarded. Copy sent to Debian Jenkins Team <qa-jenkins-dev@lists.alioth.debian.org>. (Sun, 17 Sep 2017 02:51:04 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Vagrant Cascadian <vagrant@debian.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: reproducible: i/o issues with profitbricks-build2-i386 since stretch upgrade
Date: Sat, 16 Sep 2017 19:48:42 -0700
[Message part 1 (text/plain, inline)]
Package: jenkins.debian.org
Severity: normal

It looks like after the upgrade to stretch (late june/early july), two
of the i386 builders, profitbricks-build2-i386 and
profitbricks-build12-i386 suddenly developed large i/o issues.

You can see this on the munin graphs for the year, where the blue i/o
wait spikes:

  https://jenkins.debian.net/munin/debian.net/profitbricks-build2-i386.debian.net/cpu.html
  https://jenkins.debian.net/munin/debian.net/profitbricks-build12-i386.debian.net/cpu.html

Comparing this to the other i386 builders, where there is no huge
spike in i/o wait:

  https://jenkins.debian.net/munin/debian.net/profitbricks-build6-i386.debian.net/cpu.html
  https://jenkins.debian.net/munin/debian.net/profitbricks-build16-i386.debian.net/cpu.html

I suspect this is reducing the i386 builds per day significantly,
averaging only ~1200 in the last 3 months.


My *hunch* is that build2 and build12 are running a PAE kernel with more
than 8GB of ram, and affected by this kernel bug (introduced in linux
~4.2, possibly):

  https://bugzilla.kernel.org/show_bug.cgi?id=196157
  https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1698118


Reducing the ram of the affected builders to 8GB and having more PAE
builders with lighter workloads might be a workaround that would get
better performance... while still testing 32/64-bit kernel
variation.

Alternately, switching to only amd64 kernels might also fix the issue,
though wouldn't test 32/64-bit kernel variations.

Running a linux 4.1 kernel from snapshot.debian.org might be another way
to test the issue, even if not running long-term.


live well,
  vagrant
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Jenkins Team <qa-jenkins-dev@lists.alioth.debian.org>:
Bug#875990; Package jenkins.debian.org. (Sun, 17 Sep 2017 12:39:03 GMT) (full text, mbox, link).


Acknowledgement sent to Holger Levsen <holger@layer-acht.org>:
Extra info received and forwarded to list. Copy sent to Debian Jenkins Team <qa-jenkins-dev@lists.alioth.debian.org>. (Sun, 17 Sep 2017 12:39:03 GMT) (full text, mbox, link).


Message #10 received at 875990@bugs.debian.org (full text, mbox, reply):

From: Holger Levsen <holger@layer-acht.org>
To: Vagrant Cascadian <vagrant@debian.org>, 875990@bugs.debian.org
Subject: Re: [Qa-jenkins-dev] Bug#875990: reproducible: i/o issues with profitbricks-build2-i386 since stretch upgrade
Date: Sun, 17 Sep 2017 12:37:12 +0000
[Message part 1 (text/plain, inline)]
Hi Vagrant,

thanks for filling this bug and thus properly documenting what we had discovered,
discussed and lost on IRC already…

On Sat, Sep 16, 2017 at 07:48:42PM -0700, Vagrant Cascadian wrote:
> My *hunch* is that build2 and build12 are running a PAE kernel with more
> than 8GB of ram, and affected by this kernel bug (introduced in linux
> ~4.2, possibly):
>   https://bugzilla.kernel.org/show_bug.cgi?id=196157
>   https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1698118
 
indeed!

> Reducing the ram of the affected builders to 8GB and having more PAE
> builders with lighter workloads might be a workaround that would get
> better performance... while still testing 32/64-bit kernel
> variation.

we lack the diskspace to do so.
 
> Alternately, switching to only amd64 kernels might also fix the issue,
> though wouldn't test 32/64-bit kernel variations.

indeed.

> Running a linux 4.1 kernel from snapshot.debian.org might be another way
> to test the issue, even if not running long-term.

yeah.

another option is to just wait. :/


-- 
cheers,
	Holger
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Jenkins Team <qa-jenkins-dev@lists.alioth.debian.org>:
Bug#875990; Package jenkins.debian.org. (Sun, 17 Sep 2017 16:30:03 GMT) (full text, mbox, link).


Acknowledgement sent to Vagrant Cascadian <vagrant@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Jenkins Team <qa-jenkins-dev@lists.alioth.debian.org>. (Sun, 17 Sep 2017 16:30:03 GMT) (full text, mbox, link).


Message #15 received at 875990@bugs.debian.org (full text, mbox, reply):

From: Vagrant Cascadian <vagrant@debian.org>
To: 875990@bugs.debian.org
Subject: Re: [Qa-jenkins-dev] Bug#875990: reproducible: i/o issues with profitbricks-build2-i386 since stretch upgrade
Date: Sun, 17 Sep 2017 09:18:52 -0700
[Message part 1 (text/plain, inline)]
On 2017-09-17, Holger Levsen wrote:
> On Sat, Sep 16, 2017 at 07:48:42PM -0700, Vagrant Cascadian wrote:
>> My *hunch* is that build2 and build12 are running a PAE kernel with more
>> than 8GB of ram, and affected by this kernel bug (introduced in linux
>> ~4.2, possibly):
>>   https://bugzilla.kernel.org/show_bug.cgi?id=196157
>>   https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1698118
>  
> indeed!

Not sure if it makes sense to also file a bug in the debian bug tracker
about this...


>> Reducing the ram of the affected builders to 8GB and having more PAE
>> builders with lighter workloads might be a workaround that would get
>> better performance... while still testing 32/64-bit kernel
>> variation.
>
> we lack the diskspace to do so.

Then it might still get better performance to lower the PAE builders to
only use 8GB of ram, even if that means running fewer jobs in
parallel...


>> Running a linux 4.1 kernel from snapshot.debian.org might be another way
>> to test the issue, even if not running long-term.
>
> yeah.

Now that I think about it, switching back to a 3.16.x kernel from jessie
for the PAE builders should be viable at least as long as jessie-lts is
around...


> another option is to just wait. :/

I suspect that *might* be an infinite wait; I get the impression this is
a very low-priority issue upstream, and it would take some active
attempt to fix it upstream...


live well,
  vagrant
[signature.asc (application/pgp-signature, inline)]

Added blocking bug(s) of 875990: 876035 Request was from Vagrant Cascadian <vagrant@debian.org> to submit@bugs.debian.org. (Sun, 17 Sep 2017 17:21:05 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Debian Jenkins Team <qa-jenkins-dev@lists.alioth.debian.org>:
Bug#875990; Package jenkins.debian.org. (Sun, 21 Jan 2018 22:27:03 GMT) (full text, mbox, link).


Acknowledgement sent to Vagrant Cascadian <vagrant@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Jenkins Team <qa-jenkins-dev@lists.alioth.debian.org>. (Sun, 21 Jan 2018 22:27:03 GMT) (full text, mbox, link).


Message #22 received at 875990@bugs.debian.org (full text, mbox, reply):

From: Vagrant Cascadian <vagrant@debian.org>
To: 875990@bugs.debian.org
Subject: Re: [Qa-jenkins-dev] Bug#875990: reproducible: i/o issues with profitbricks-build2-i386 since stretch upgrade
Date: Sun, 21 Jan 2018 14:22:47 -0800
[Message part 1 (text/plain, inline)]
On 2017-09-17, Vagrant Cascadian wrote:
> On 2017-09-17, Holger Levsen wrote:
>> On Sat, Sep 16, 2017 at 07:48:42PM -0700, Vagrant Cascadian wrote:
>>> My *hunch* is that build2 and build12 are running a PAE kernel with more
>>> than 8GB of ram, and affected by this kernel bug (introduced in linux
>>> ~4.2, possibly):
>>>   https://bugzilla.kernel.org/show_bug.cgi?id=196157
>>>   https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1698118
>>  
>> indeed!
>
> Not sure if it makes sense to also file a bug in the debian bug tracker
> about this...

I did end up filing https://bugs.debian.org/876035 and the response so
far was only to downgrade it as minor, due to the unusual use-case of
running i386 userspace with a PAE instead of amd64 kernel these days.


>>> Reducing the ram of the affected builders to 8GB and having more PAE
>>> builders with lighter workloads might be a workaround that would get
>>> better performance... while still testing 32/64-bit kernel
>>> variation.
>>
>> we lack the diskspace to do so.
>
> Then it might still get better performance to lower the PAE builders to
> only use 8GB of ram, even if that means running fewer jobs in
> parallel...

Again, I think simply lowering the ram to 8GB might actually result in
better performance, as it's a non-linear degredation. Looking at the ram
usage patterns of the i386 builders, as they infrequently use more than
8GB:

  https://jenkins.debian.net/munin/debian.net/profitbricks-build12-i386.debian.net/memory.html

Or even 12GB or 16GB, though that will trigger the i/o wait issue more.


>> another option is to just wait. :/
>
> I suspect that *might* be an infinite wait; I get the impression this is
> a very low-priority issue upstream, and it would take some active
> attempt to fix it upstream...

Haven't seen any progress on issue in Debian or upstream, several months
later...


live well,
  vagrant
[signature.asc (application/pgp-signature, inline)]

Reply sent to Holger Levsen <holger@layer-acht.org>:
You have taken responsibility. (Mon, 10 Sep 2018 21:00:06 GMT) (full text, mbox, link).


Notification sent to Vagrant Cascadian <vagrant@debian.org>:
Bug acknowledged by developer. (Mon, 10 Sep 2018 21:00:06 GMT) (full text, mbox, link).


Message #27 received at 875990-done@bugs.debian.org (full text, mbox, reply):

From: Holger Levsen <holger@layer-acht.org>
To: 875990-done@bugs.debian.org
Subject: [Git][qa/jenkins.debian.net][master] reproducible Debian: use amd64 kernels on all i386 nodes (Closes: #875990]
Date: Mon, 10 Sep 2018 20:57:24 +0000
[Message part 1 (text/plain, inline)]
----- Forwarded message from Holger Levsen <gitlab@salsa.debian.org> -----

Date: Mon, 10 Sep 2018 20:48:48 +0000
From: Holger Levsen <gitlab@salsa.debian.org>
To: qa-jenkins-scm@lists.alioth.debian.org
Subject: [Qa-jenkins-scm] [Git][qa/jenkins.debian.net][master] reproducible Debian: use amd64 kernels on all i386 nodes (Closes: #875990
List-Id: "SCM mails for the development of jenkins.debian.org" <qa-jenkins-scm.alioth-lists.debian.net>
Reply-To: noreply@salsa.debian.org

Holger Levsen pushed to branch master at Debian QA / jenkins.debian.net


Commits:
0fef9342 by Holger Levsen at 2018-09-10T20:48:28Z
reproducible Debian: use amd64 kernels on all i386 nodes (Closes: #875990

Signed-off-by: Holger Levsen &lt;holger@layer-acht.org&gt;

- - - - -


3 changed files:

- − hosts/profitbricks-build12-i386/etc/apt/sources.list
- − hosts/profitbricks-build2-i386/etc/apt/sources.list
- update_jdn.sh


Changes:

=====================================
hosts/profitbricks-build12-i386/etc/apt/sources.list deleted
=====================================
@@ -1,15 +0,0 @@
-deb http://deb.debian.org/debian/ stretch main contrib non-free
-#deb-src http://deb.debian.org/debian/ stretch main contrib non-free
-
-deb http://deb.debian.org/debian/ stretch-updates main contrib non-free
-#deb-src http://deb.debian.org/debian/ stretch-updates main contrib non-free
-
-deb http://security.debian.org/ stretch/updates main contrib non-free
-#deb-src http://security.debian.org/ stretch/updates main contrib non-free
-
-deb http://deb.debian.org/debian/ stretch-backports main contrib non-free
-#deb-src http://deb.debian.org/debian/ stretch-backports main contrib non-free
-
-# workaround for i386 kernel bugs #875990 + #876035
-deb http://deb.debian.org/debian-security jessie/updates main
-deb http://deb.debian.org/debian jessie main


=====================================
hosts/profitbricks-build2-i386/etc/apt/sources.list deleted
=====================================
@@ -1,15 +0,0 @@
-deb http://deb.debian.org/debian/ stretch main contrib non-free
-#deb-src http://deb.debian.org/debian/ stretch main contrib non-free
-
-deb http://deb.debian.org/debian/ stretch-updates main contrib non-free
-#deb-src http://deb.debian.org/debian/ stretch-updates main contrib non-free
-
-deb http://security.debian.org/ stretch/updates main contrib non-free
-#deb-src http://security.debian.org/ stretch/updates main contrib non-free
-
-deb http://deb.debian.org/debian/ stretch-backports main contrib non-free
-#deb-src http://deb.debian.org/debian/ stretch-backports main contrib non-free
-
-# workaround for i386 kernel bugs #875990 + #876035
-deb http://deb.debian.org/debian-security jessie/updates main
-deb http://deb.debian.org/debian jessie main


=====================================
update_jdn.sh
=====================================
@@ -469,11 +469,12 @@ if [ -f /etc/debian_version ] ; then
 			$UP2DATE || sudo apt-get install mock
 		fi
 		# for varying kernels:
-		# - we use bpo kernels on pb-build5+15 (and the default i386 kernel on pb-build2+12-i386)
-		# - we use the default amd64 kernel on pb-build1+11 (and the default amd64 kernel on pb-build6+16-i386)
+		# - we use bpo kernels on pb-build5+15 (and the default amd64 kernel on pb-build6+16-i386)
 		if [ "$HOSTNAME" = "profitbricks-build5-amd64" ] || [ "$HOSTNAME" = "profitbricks-build15-amd64" ] ; then
 			$UP2DATE || sudo apt-get install -t stretch-backports linux-image-amd64
-		elif [ "$HOSTNAME" = "profitbricks-build6-i386" ] || [ "$HOSTNAME" = "profitbricks-build16-i386" ] ; then
+		elif [ "$HOSTNAME" = "profitbricks-build6-i386" ] || [ "$HOSTNAME" = "profitbricks-build16-i386" ] \
+			|| [ "$HOSTNAME" = "profitbricks-build2-i386" ] || [ "$HOSTNAME" = "profitbricks-build12-i386" ] ; then
+			# we dont vary the kernel on i386 atm, see #875990 + #876035
 			$UP2DATE || sudo apt-get install linux-image-amd64
 		fi
 		# only needed on the main nodes



View it on GitLab: https://salsa.debian.org/qa/jenkins.debian.net/commit/0fef93423e8f485a5d2c866b86191c859c6787a4

-- 
View it on GitLab: https://salsa.debian.org/qa/jenkins.debian.net/commit/0fef93423e8f485a5d2c866b86191c859c6787a4
You're receiving this email because of your account on salsa.debian.org.

_______________________________________________
Qa-jenkins-scm mailing list
Qa-jenkins-scm@alioth-lists.debian.net
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/qa-jenkins-scm


----- End forwarded message -----

-- 
cheers,
	Holger

-------------------------------------------------------------------------------
               holger@(debian|reproducible-builds|layer-acht).org
       PGP fingerprint: B8BF 5413 7B09 D35C F026 FE9D 091A B856 069A AA1C
[signature.asc (application/pgp-signature, inline)]

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Tue, 09 Oct 2018 07:32:37 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed May 17 10:48:48 2023; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.