Debian Bug report logs - #707178
RFP: breakin -- stress-test and hardware diagnostics tool

version graph

Package: wnpp; Maintainer for wnpp is wnpp@debian.org;

Reported by: Antoine Beaupré <anarcat@debian.org>

Date: Tue, 7 May 2013 22:57:01 UTC

Severity: wishlist

Fixed in version stressant/0.3.1

Done: Antoine Beaupré <anarcat@debian.org>

Bug is archived. No further changes may be made.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, debian-devel@lists.debian.org, taggart@riseup.net, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Tue, 07 May 2013 22:57:06 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
New Bug report received and forwarded. Copy sent to debian-devel@lists.debian.org, taggart@riseup.net, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Tue, 07 May 2013 22:57:06 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: ITP: breakin -- stress-test and hardware diagnostics tool
Date: Tue, 07 May 2013 18:55:21 -0400
Package: wnpp
Severity: wishlist
Owner: "Antoine Beaupré" <anarcat@debian.org>

* Package name    : breakin
  Version         : 3.20
  Upstream Author : Jason D. Clinton <me@jasonclinton.com>, Kyle Sheumaker <ksheumaker@advancedclustering.com>
* URL             : http://www.advancedclustering.com/software/breakin.html
* License         : GPL2
  Programming Lang: C
  Description     : stress-test and hardware diagnostics tool

Breakin is Advanced Clustering's stress-test and hardware diagnostics
tool. Advanced Clustering engineers developed breakin because no other
available product — commercial or open source — could pinpoint hardware
issues and component failures as well as they wanted.

= Packaging notes =

The software is split into two main parts: the "breakin" software, which
is a fairly self-contained C program with affiliated scripts which runs
on boot up, and the "bootimage" distribution, which is a custom Linux
distribution that is built from the ground up.

I believe that the "breakin" part should be made into a Debian package,
and the "bootimage" part should be discarded or made into a "debirf"
extension.

The dependencies of breakin are unclear, but the bootimage build scripts
build the following parts:

anarcat@desktop006:bootimage$ ls srcctrl/
actdmi    dropbear    htop        lvm2      parted         syslinux     wget
afio      e2fsprogs   initfs      mcelog    pciutils       tcpdump
arecacli  edac-utils  ipmitool    mdadm     rsync          terminfo
bonnie++  ethtool     kernel      mdinfo    screen         udev
breakin   hddtemp     links       mstflint  smartmontools  udpcast
busybox   hpl         lm-sensors  numactl   stream         util-linux

A lot of those are available in Debian already, except:

actdmi
arecacli
breakin
hpl
initfs
kernel
mdinfo
stream
terminfo

Specific assessment should be done on those dependencies.

I am interested in packaging the "breakin" part, but my time is limited
and would appreciate any help. I can sponsor a package too, and will try
to inform upstream of our efforts.



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Wed, 08 May 2013 00:03:09 GMT) (full text, mbox, link).


Acknowledgement sent to Matt Taggart <taggart@debian.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Wed, 08 May 2013 00:03:09 GMT) (full text, mbox, link).


Message #10 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Matt Taggart <taggart@debian.org>
To: Antoine Beaupré <anarcat@debian.org>, 707178@bugs.debian.org
Subject: Re: Bug#707178: ITP: breakin -- stress-test and hardware diagnostics tool
Date: Tue, 07 May 2013 16:54:14 -0700
Hi,

I investigated some of the not-already-in-debian dependencies:

Antoine Beaupré writes:

> actdmi

http://git.advancedclustering.com/git/actdmi.git

An advancedclustering fork of dmidecode, forked sometime around version
2.5. Not sure why they forked or what they changed, a diff to upstream
of the era isn't obvious. Some more investigation will be needed here.

> arecacli

Some sort of tool for controlling areca HBAs?

http://www.areca.us/support/s_linux/cli/x86_64/cli64.zip

> breakin

The actual breakin tool

http://git.advancedclustering.com/git/breakin.git

> hpl

http://www.netlib.org/benchmark/hpl/

The hpcc package appears to depend on this too but expects you to install
it by hand from upstream.

> initfs

Refers to

http://git.advancedclustering.com/git/bootimage-initfs.git

Tools for creating the initfs the bootimage uses.

> kernel

Refers to upstream kernel source from

http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.38.2.tar.bz2

> mdinfo

Simple tool for gathering info about md raid arrays.

http://git.advancedclustering.com/git/mdinfo.git

> stream

CPU/Memory benchmark

http://www.cs.virginia.edu/stream/

the hpcc package also uses this (also expected to be installed by hand).

> terminfo

Copies things from /usr/share/terminfo and /lib/terminfo
Provided by ncurses-base and ncurses-term packages.

afio is also listed as a dependency, but is currently in non-free
(#509287). I haven't investigated what it's using it for.

-- 
Matt Taggart
taggart@debian.org



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Wed, 15 May 2013 23:42:07 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Wed, 15 May 2013 23:42:07 GMT) (full text, mbox, link).


Message #15 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: taggart@debian.org
Cc: 707178@bugs.debian.org, dkg@fifthhorseman.net, jrollins@finestructure.net
Subject: start of a debirf stress-testing extension: "stressant"
Date: Wed, 15 May 2013 19:38:50 -0400
[Message part 1 (text/plain, inline)]
So the guts of this idea are around this ITP:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=707178

Our requirement at Koumbit.org is to have a tool that will do a series
of tests when we receive or want to verify new hardware. This would
ideally be in a debirf image, since those are easily to build and
share. A good inspiration of this is the "breakin" software, which the
above ITP is about.

I have started working on a silly debirf rescue image. I basically took
the rescue image from 0.33 and added this file in
rescue/modules/stressant:

debirf_exec apt-get --no-install-recommends --assume-yes install bonnie++ cpuburn stress smartmontools e2fsprogs util-linux screen

Quite trivial really. But to ease collaboration, I started a repo here:

https://redmine.koumbit.net/projects/stressant

Cheers,

A.

-- 
Le pouvoir n'est pas à conquérir, il est à détruire
                        - Jean-François Brient, de la servitude moderne
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Thu, 16 May 2013 14:30:23 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Kahn Gillmor <dkg@fifthhorseman.net>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Thu, 16 May 2013 14:30:23 GMT) (full text, mbox, link).


Message #20 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Antoine Beaupré <anarcat@debian.org>
Cc: taggart@debian.org, 707178@bugs.debian.org, debirf@lists.mayfirst.org
Subject: Re: start of a debirf stress-testing extension: "stressant"
Date: Thu, 16 May 2013 10:26:13 -0400
[Message part 1 (text/plain, inline)]
[adding a cc: to the debirf mailing list [0]]

On 05/15/2013 07:38 PM, Antoine Beaupré wrote:
> So the guts of this idea are around this ITP:
> 
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=707178

interesting, thanks.

> I have started working on a silly debirf rescue image. I basically took
> the rescue image from 0.33 and added this file in
> rescue/modules/stressant:
> 
> debirf_exec apt-get --no-install-recommends --assume-yes install bonnie++ cpuburn stress smartmontools e2fsprogs util-linux screen

screen and smartmontools and e2fsprogs should already be part of
debirf's stock rescue.

> Quite trivial really. But to ease collaboration, I started a repo here:
> 
> https://redmine.koumbit.net/projects/stressant

we'd be happy to include these sorts of changes directly in the debirf
upstream repo, whether they're just packages you think belong in the
rescue image, or (even better in my opinion) if you want to make
"stressant" another stock example profile, alongside "minimal",
"rescue", and "xkiosk".

Feel free to clone the debirf repo [1] send patches directly to the
mailing list.  We'd be happy to integrate them.

Regards,

	--dkg

[0] https://lists.mayfirst.org/mailman/listinfo/debirf
[1] git://finestructure.net/debirf or
git://lair.fifthhorseman.net/~dkg/debirf

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Thu, 16 May 2013 15:12:04 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Thu, 16 May 2013 15:12:04 GMT) (full text, mbox, link).


Message #25 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Cc: taggart@debian.org, 707178@bugs.debian.org, debirf@lists.mayfirst.org
Subject: Re: start of a debirf stress-testing extension: "stressant"
Date: Thu, 16 May 2013 11:08:51 -0400
[Message part 1 (text/plain, inline)]
On 2013-05-16 10:26:13, Daniel Kahn Gillmor wrote:
> [adding a cc: to the debirf mailing list [0]]
>
> On 05/15/2013 07:38 PM, Antoine Beaupré wrote:
>> I have started working on a silly debirf rescue image. I basically took
>> the rescue image from 0.33 and added this file in
>> rescue/modules/stressant:
>> 
>> debirf_exec apt-get --no-install-recommends --assume-yes install bonnie++ cpuburn stress smartmontools e2fsprogs util-linux screen
>
> screen and smartmontools and e2fsprogs should already be part of
> debirf's stock rescue.

alright, fixin'... fixed!

i wonder if util-linux is really necessary - since it's of priority
'required', are those installed automatically?

>> Quite trivial really. But to ease collaboration, I started a repo here:
>> 
>> https://redmine.koumbit.net/projects/stressant
>
> we'd be happy to include these sorts of changes directly in the debirf
> upstream repo, whether they're just packages you think belong in the
> rescue image, or (even better in my opinion) if you want to make
> "stressant" another stock example profile, alongside "minimal",
> "rescue", and "xkiosk".

That would be great!

However, one thing I am looking at is a way to generate binary images
directly through a package, something that has been discussed in #620294
I believe, but never implemented.

Also, I wonder if it wouldn't be better to set `stressant` as an example
of how it's possible to extend debirf without having to merge patches
into the upstream project itself, some kind of "contrib" way of doing
things that could radically expand the number of debirf-built flavors
out there, as they wouldn't depend on a central piece of authority to be
distributed or accepted.

> Feel free to clone the debirf repo [1] send patches directly to the
> mailing list.  We'd be happy to integrate them.

So far, the changes to the rescue image are really minimal, it's just a
bunch of extra packages but otherwise very minimally changed from the
rescue build.

I will wait and see what you guys think about making a third party
extension before going on with a merge, especially since the project is
far from finished...

A.
-- 
Antoine Beaupré +++ Réseau Koumbit Networks +++ +1.514.387.6262 #208
--------------------------------------------------------------------
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Thu, 16 May 2013 15:24:04 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Kahn Gillmor <dkg@fifthhorseman.net>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Thu, 16 May 2013 15:24:04 GMT) (full text, mbox, link).


Message #30 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Antoine Beaupré <anarcat@debian.org>
Cc: taggart@debian.org, 707178@bugs.debian.org, debirf@lists.mayfirst.org
Subject: Re: start of a debirf stress-testing extension: "stressant"
Date: Thu, 16 May 2013 11:20:32 -0400
[Message part 1 (text/plain, inline)]
On 05/16/2013 11:08 AM, Antoine Beaupré wrote:

> i wonder if util-linux is really necessary - since it's of priority
> 'required', are those installed automatically?

util-linux is also "Essential: yes", so i think that's superfluous too.

> That would be great!
> 
> However, one thing I am looking at is a way to generate binary images
> directly through a package, something that has been discussed in #620294
> I believe, but never implemented.

Cool, but we should keep that as a separate issue, i think.

> Also, I wonder if it wouldn't be better to set `stressant` as an example
> of how it's possible to extend debirf without having to merge patches
> into the upstream project itself, some kind of "contrib" way of doing
> things that could radically expand the number of debirf-built flavors
> out there, as they wouldn't depend on a central piece of authority to be
> distributed or accepted.

i don't want to play a gatekeeper role, and i'm happy to encourage other
contributions.  but i actually actively want something like the proposed
stressant in debirf.  I think it would be really useful.

> So far, the changes to the rescue image are really minimal, it's just a
> bunch of extra packages but otherwise very minimally changed from the
> rescue build.

if it's a small set of packages that arguably fit in with rescue, let's
fold them in there.

But i also like the idea of a stressant profile being something you boot
into (like memtest86+) that just automatically hammers on your hardware
in a reliable and repeatable fashion while reporting its conclusions in
a standardized way.

> I will wait and see what you guys think about making a third party
> extension before going on with a merge, especially since the project is
> far from finished...

I'd personally rather integrate your suggestions with mainline debirf
than juggle it as a "contrib".

	--dkg

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Thu, 16 May 2013 17:39:04 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Thu, 16 May 2013 17:39:04 GMT) (full text, mbox, link).


Message #35 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Cc: taggart@debian.org, 707178@bugs.debian.org, debirf@lists.mayfirst.org
Subject: Re: start of a debirf stress-testing extension: "stressant"
Date: Thu, 16 May 2013 13:34:52 -0400
[Message part 1 (text/plain, inline)]
On 2013-05-16 11:20:32, Daniel Kahn Gillmor wrote:
> On 05/16/2013 11:08 AM, Antoine Beaupré wrote:
>
>> i wonder if util-linux is really necessary - since it's of priority
>> 'required', are those installed automatically?
>
> util-linux is also "Essential: yes", so i think that's superfluous too.

fixed.

>> However, one thing I am looking at is a way to generate binary images
>> directly through a package, something that has been discussed in #620294
>> I believe, but never implemented.
>
> Cool, but we should keep that as a separate issue, i think.

Agreed.

>> Also, I wonder if it wouldn't be better to set `stressant` as an example
>> of how it's possible to extend debirf without having to merge patches
>> into the upstream project itself, some kind of "contrib" way of doing
>> things that could radically expand the number of debirf-built flavors
>> out there, as they wouldn't depend on a central piece of authority to be
>> distributed or accepted.
>
> i don't want to play a gatekeeper role, and i'm happy to encourage other
> contributions.  but i actually actively want something like the proposed
> stressant in debirf.  I think it would be really useful.

Okay well how about this: I have tons of little tweaks to do to that
package, and I prefer to keep it small and local for now. When it is
complete and makes sense, we'll merge it in.

I do think it would be helpful to have documentation on how to create
third-party extensions like I did for stressant.

>> So far, the changes to the rescue image are really minimal, it's just a
>> bunch of extra packages but otherwise very minimally changed from the
>> rescue build.
>
> if it's a small set of packages that arguably fit in with rescue, let's
> fold them in there.

Okay. Then maybe I should reset our repo to be a fork of debirf instead
of standalone?

> But i also like the idea of a stressant profile being something you boot
> into (like memtest86+) that just automatically hammers on your hardware
> in a reliable and repeatable fashion while reporting its conclusions in
> a standardized way.

yes, that is the ultimate goal behind this ITP.

A.

-- 
Pour marcher au pas d'une musique militaire, il n'y a pas besoin de
cerveau, une moelle épinière suffit.
                        - Albert Enstein
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Thu, 16 May 2013 17:48:07 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Kahn Gillmor <dkg@fifthhorseman.net>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Thu, 16 May 2013 17:48:07 GMT) (full text, mbox, link).


Message #40 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Antoine Beaupré <anarcat@debian.org>
Cc: taggart@debian.org, 707178@bugs.debian.org, debirf@lists.mayfirst.org
Subject: Re: start of a debirf stress-testing extension: "stressant"
Date: Thu, 16 May 2013 13:45:25 -0400
[Message part 1 (text/plain, inline)]
On 05/16/2013 01:34 PM, Antoine Beaupré wrote:
> Okay well how about this: I have tons of little tweaks to do to that
> package, and I prefer to keep it small and local for now. When it is
> complete and makes sense, we'll merge it in.

it's software; it will never be complete :)  but i'm fine with waiting
until it "makes sense" to you to merge it.

> I do think it would be helpful to have documentation on how to create
> third-party extensions like I did for stressant.

hm, yes, that's probably true.

> Okay. Then maybe I should reset our repo to be a fork of debirf instead
> of standalone?

yes, please, that would make it much easier to merge (in both
directions), and would also make it clearer which version of debirf
you're expecting to work from.

	--dkg

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Fri, 17 Jan 2014 09:21:04 GMT) (full text, mbox, link).


Acknowledgement sent to Bryan Fisher <bryanf@pinnacle.co.za>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Fri, 17 Jan 2014 09:21:04 GMT) (full text, mbox, link).


Message #45 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Bryan Fisher <bryanf@pinnacle.co.za>
To: Antoine Beaupré <anarcat@debian.org>, "taggart@debian.org" <taggart@debian.org>
Cc: "707178@bugs.debian.org" <707178@bugs.debian.org>, "dkg@fifthhorseman.net" <dkg@fifthhorseman.net>, "jrollins@finestructure.net" <jrollins@finestructure.net>
Subject: Breakin - stress-test and hardware diagnostics tool - Please see if you are able to assist to an issue we are having now for more than a month on 3 servers
Date: Fri, 17 Jan 2014 09:11:22 +0000
[Message part 1 (text/plain, inline)]
Good day,

My name is Bryan Fisher, and I work for a company called Pinnacle Africa in South Africa, Cape Town.

I was hoping that maybe you could assist me in the issue that I am getting with server h/w please.

Attached is a screenshot of what happens when I insert a USB key to copy the Breakin log file. It also indicates that 'Failid - Other tests have errors, tuning on ID light..' would it be possible if you could point me in a direction to find the fault please?

I have run multiple memtest and it passes I have run burin in test in windows it passes, I have ran Sandra test & diagnostics and it passes while monitoring voltages and system temperatures it is all stable.

However, the server works for a few weeks, couple of months and it starts getting issues like reboots, freezing up and basically becomes unstable. We have changed mainboards, all ram modules, tested the PSU but the issue still remains.

This is the server H/W below that is in use;
ECC REG RAM - 8GB x4 modules
X9SRI-F - mainboard
E5-2620V2 2.1GB 6C Ivy bridge
1U Dual Xeon Chassis CSE-813MTQ-600CB

Please advise if you are able to assist or even just tell where the issue might be.

I got you e-mail address from here;
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=707178


Thank you kindly in advance! Hope to hearfrom you ASAP

Kind regards,


Bryan Fisher | Server Specialist
Cape Town | Pinnacle Africa
Direct: +27 21 5500 357 | Fax: +27 21 551 3444


[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Fri, 17 Jan 2014 09:24:04 GMT) (full text, mbox, link).


Acknowledgement sent to Bryan Fisher <bryanf@pinnacle.co.za>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Fri, 17 Jan 2014 09:24:05 GMT) (full text, mbox, link).


Message #50 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Bryan Fisher <bryanf@pinnacle.co.za>
To: Antoine Beaupré <anarcat@debian.org>, "taggart@debian.org" <taggart@debian.org>
Cc: "707178@bugs.debian.org" <707178@bugs.debian.org>, "dkg@fifthhorseman.net" <dkg@fifthhorseman.net>, "jrollins@finestructure.net" <jrollins@finestructure.net>
Subject: RE: Breakin - stress-test and hardware diagnostics tool - Please see if you are able to assist to an issue we are having now for more than a month on 3 servers
Date: Fri, 17 Jan 2014 09:13:10 +0000
[Message part 1 (text/plain, inline)]
Sorry here's the attachment

From: Bryan Fisher
Sent: 17 January 2014 11:11 AM
To: 'Antoine Beaupré'; 'taggart@debian.org'
Cc: '707178@bugs.debian.org'; 'dkg@fifthhorseman.net'; 'jrollins@finestructure.net'
Subject: Breakin - stress-test and hardware diagnostics tool - Please see if you are able to assist to an issue we are having now for more than a month on 3 servers
Importance: High

Good day,

My name is Bryan Fisher, and I work for a company called Pinnacle Africa in South Africa, Cape Town.

I was hoping that maybe you could assist me in the issue that I am getting with server h/w please.

Attached is a screenshot of what happens when I insert a USB key to copy the Breakin log file. It also indicates that 'Failid - Other tests have errors, tuning on ID light..' would it be possible if you could point me in a direction to find the fault please?

I have run multiple memtest and it passes I have run burin in test in windows it passes, I have ran Sandra test & diagnostics and it passes while monitoring voltages and system temperatures it is all stable.

However, the server works for a few weeks, couple of months and it starts getting issues like reboots, freezing up and basically becomes unstable. We have changed mainboards, all ram modules, tested the PSU but the issue still remains.

This is the server H/W below that is in use;
ECC REG RAM - 8GB x4 modules
X9SRI-F - mainboard
E5-2620V2 2.1GB 6C Ivy bridge
1U Dual Xeon Chassis CSE-813MTQ-600CB

Please advise if you are able to assist or even just tell where the issue might be.

I got you e-mail address from here;
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=707178


Thank you kindly in advance! Hope to hearfrom you ASAP

Kind regards,


Bryan Fisher | Server Specialist
Cape Town | Pinnacle Africa
Direct: +27 21 5500 357 | Fax: +27 21 551 3444


[Message part 2 (text/html, inline)]
[Breakin_error.jpg (image/jpeg, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Fri, 17 Jan 2014 18:18:04 GMT) (full text, mbox, link).


Acknowledgement sent to Daniel Kahn Gillmor <dkg@fifthhorseman.net>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Fri, 17 Jan 2014 18:18:04 GMT) (full text, mbox, link).


Message #55 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: Bryan Fisher <bryanf@pinnacle.co.za>, Antoine Beaupré <anarcat@debian.org>, "taggart@debian.org" <taggart@debian.org>
Cc: "707178@bugs.debian.org" <707178@bugs.debian.org>, "jrollins@finestructure.net" <jrollins@finestructure.net>
Subject: Re: Breakin - stress-test and hardware diagnostics tool - Please see if you are able to assist to an issue we are having now for more than a month on 3 servers
Date: Fri, 17 Jan 2014 13:14:40 -0500
[Message part 1 (text/plain, inline)]
Hi Bryan--

On 01/17/2014 04:13 AM, Bryan Fisher wrote:

> My name is Bryan Fisher, and I work for a company called Pinnacle Africa in South Africa, Cape Town.
> 
> I was hoping that maybe you could assist me in the issue that I am getting with server h/w please.

I think you're asking about something underlated to what
http://bugs.debian.org/707178 is talking about.  The people that you've
e-mailed don't have anything to do with the breakin project, and we
can't support your organization's hardware at any rate.

I recommend you follow up with your hardware vendor (or retain other
local technical staff), and explain to them the errors that you're
having.  But your post is off-topic for the discussion of
http://bugs.debian.org/707178.

Regards,

	--dkg

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Fri, 17 Jan 2014 19:54:04 GMT) (full text, mbox, link).


Acknowledgement sent to Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Fri, 17 Jan 2014 19:54:04 GMT) (full text, mbox, link).


Message #60 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
To: Bryan Fisher <bryanf@pinnacle.co.za>
Cc: 707178@bugs.debian.org, 733565@bugs.debian.org, Antoine Beaupré <anarcat@debian.org>, "taggart@debian.org" <taggart@debian.org>, "dkg@fifthhorseman.net" <dkg@fifthhorseman.net>, "jrollins@finestructure.net" <jrollins@finestructure.net>
Subject: Re: Bug#707178: Breakin - stress-test and hardware diagnostics tool - Please see if you are able to assist to an issue we are having now for more than a month on 3 servers
Date: Fri, 17 Jan 2014 20:51:10 +0100
Hi Bryan,

On 17.01.2014 10:13, Bryan Fisher wrote:
> I was hoping that maybe you could assist me in the issue that I am
> getting with server h/w please.
>
> Attached is a screenshot of what happens when I insert a USB key to copy
> the Breakin log file. It also indicates that ‘Failid – Other tests have
> errors, tuning on ID light..’ would it be possible if you could point me
> in a direction to find the fault please?

The error message (repeated 3 times) I read from the screenshot is:
kernel: [ 3009.877308] sd 9:0:0:0: [sdb] No Caching mode page present
kernel: [ 3009.877311] sd 9:0:0:0: [sdb] Assuming drive cache: write through

This reminds me of bug #733565 [1], which is about a request to silence 
these error messages.
I have seen similar messages and they seem to be totally harmless and 
have nothing to do with hardware failure.

Best regards,
Andreas


1: http://bugs.debian.org/733565



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Mon, 20 Jan 2014 06:42:05 GMT) (full text, mbox, link).


Acknowledgement sent to Bryan Fisher <bryanf@pinnacle.co.za>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Mon, 20 Jan 2014 06:42:05 GMT) (full text, mbox, link).


Message #65 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Bryan Fisher <bryanf@pinnacle.co.za>
To: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Cc: "707178@bugs.debian.org" <707178@bugs.debian.org>, "733565@bugs.debian.org" <733565@bugs.debian.org>, Antoine Beaupré <anarcat@debian.org>, "taggart@debian.org" <taggart@debian.org>, "dkg@fifthhorseman.net" <dkg@fifthhorseman.net>, "jrollins@finestructure.net" <jrollins@finestructure.net>
Subject: RE: Bug#707178: Breakin - stress-test and hardware diagnostics tool - Please see if you are able to assist to an issue we are having now for more than a month on 3 servers
Date: Mon, 20 Jan 2014 06:35:57 +0000
Thank you very much Andreas!

Kind regards,

Bryan


-----Original Message-----
From: Andreas Cadhalpun [mailto:andreas.cadhalpun@googlemail.com] 
Sent: 17 January 2014 09:51 PM
To: Bryan Fisher
Cc: 707178@bugs.debian.org; 733565@bugs.debian.org; Antoine Beaupré; taggart@debian.org; dkg@fifthhorseman.net; jrollins@finestructure.net
Subject: Re: Bug#707178: Breakin - stress-test and hardware diagnostics tool - Please see if you are able to assist to an issue we are having now for more than a month on 3 servers

Hi Bryan,

On 17.01.2014 10:13, Bryan Fisher wrote:
> I was hoping that maybe you could assist me in the issue that I am 
> getting with server h/w please.
>
> Attached is a screenshot of what happens when I insert a USB key to 
> copy the Breakin log file. It also indicates that 'Failid - Other 
> tests have errors, tuning on ID light..' would it be possible if you 
> could point me in a direction to find the fault please?

The error message (repeated 3 times) I read from the screenshot is:
kernel: [ 3009.877308] sd 9:0:0:0: [sdb] No Caching mode page present
kernel: [ 3009.877311] sd 9:0:0:0: [sdb] Assuming drive cache: write through

This reminds me of bug #733565 [1], which is about a request to silence these error messages.
I have seen similar messages and they seem to be totally harmless and have nothing to do with hardware failure.

Best regards,
Andreas


1: http://bugs.debian.org/733565



Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>:
Bug#707178; Package wnpp. (Thu, 18 Sep 2014 18:21:08 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupré <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org, "Antoine Beaupré" <anarcat@debian.org>. (Thu, 18 Sep 2014 18:21:08 GMT) (full text, mbox, link).


Message #70 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: Matt Taggart <taggart@debian.org>, 707178@bugs.debian.org
Subject: Re: Bug#707178: ITP: breakin -- stress-test and hardware diagnostics tool
Date: Thu, 18 Sep 2014 14:16:33 -0400
[Message part 1 (text/plain, inline)]
Control: retitle -1 RFP: breakin -- stress-test and hardware diagnostics tool
On 2013-05-07 19:54:14, Matt Taggart wrote:
>> breakin
>
> The actual breakin tool
>
> http://git.advancedclustering.com/git/breakin.git

For the record, those URLs changed:

http://git.advancedclustering.com/cgi-bin/gitweb.cgi

Also, I have made some progress with Stressant - the "motd" now gives
out a crude script to run to do most benchmarks. It will, however, wipe
drives (!) so it definitely needs some tuning...

See this issue for followup: https://redmine.koumbit.net/issues/15443

I am unlikely to package breakin itself at this point, as all of this
can be accomplished (and better) with standard tools, but others are
welcome to try!

A.
-- 
Nothing in life is to be feared, it is only to be understood
Now is the time to understand more, so that we may fear less.
                         - Marie Curie
[Message part 2 (application/pgp-signature, inline)]

Changed Bug title to 'RFP: breakin -- stress-test and hardware diagnostics tool' from 'ITP: breakin -- stress-test and hardware diagnostics tool' Request was from Antoine Beaupré <anarcat@debian.org> to 707178-submit@bugs.debian.org. (Thu, 18 Sep 2014 18:21:08 GMT) (full text, mbox, link).


Removed annotation that Bug was owned by "Antoine Beaupré" <anarcat@debian.org>. Request was from Bart Martens <bartm@quantz.debian.org> to control@bugs.debian.org. (Fri, 19 Sep 2014 04:27:05 GMT) (full text, mbox, link).


Reply sent to Antoine Beaupré <anarcat@debian.org>:
You have taken responsibility. (Sat, 18 Mar 2017 19:03:05 GMT) (full text, mbox, link).


Notification sent to Antoine Beaupré <anarcat@debian.org>:
Bug acknowledged by developer. (Sat, 18 Mar 2017 19:03:05 GMT) (full text, mbox, link).


Message #79 received at 707178-close@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupré <anarcat@debian.org>
To: 707178-close@bugs.debian.org
Subject: Bug#707178: fixed in stressant 0.3.1
Date: Sat, 18 Mar 2017 19:00:11 +0000
Source: stressant
Source-Version: 0.3.1

We believe that the bug you reported is fixed in the latest version of
stressant, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 707178@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Antoine Beaupré <anarcat@debian.org> (supplier of updated stressant package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Format: 1.8
Date: Fri, 17 Mar 2017 12:02:16 -0400
Source: stressant
Binary: stressant
Architecture: source amd64
Version: 0.3.1
Distribution: unstable
Urgency: medium
Maintainer: Antoine Beaupré <anarcat@debian.org>
Changed-By: Antoine Beaupré <anarcat@debian.org>
Description:
 stressant  - simple stress testing and burn-in tool
Closes: 707178
Changes:
 stressant (0.3.1) unstable; urgency=medium
 .
   * fix copyright file
 .
 stressant (0.3.0) unstable; urgency=medium
 .
   * rebuild the whole project using Grml
   * rewrite and complete prototype in Python
   * email, logfile and color support
   * network tests with iPerf3
   * disk tests with fio
   * CPU tests with stress-ng
   * basic information gathered with lshw and smart
 .
 stressant (0.2) experimental; urgency=medium
 .
   * switch to using vmdebootstrap to build ISO images
   * add first prototype of test run, not packaged
   * switch to native packaging
   * Initial release (Closes: #707178)
Checksums-Sha1:
 61f4ec9441de71576d1ac95b9ad049297e3def80 1623 stressant_0.3.1.dsc
 7afeecb19656c4815505d7aeddde4bbaac2b496a 31588 stressant_0.3.1.tar.xz
 4e9013ce704760e26ee80180e428ce3dfa4b5d33 6035 stressant_0.3.1_amd64.buildinfo
 095cef46dd3b0ea44d722c339ab13ca239ca665f 19756 stressant_0.3.1_amd64.deb
Checksums-Sha256:
 04931f177b5528577acd3878afd02e269c8bac1aee55fcc873bc0c33a4a1f251 1623 stressant_0.3.1.dsc
 f4fc5d1437655eb776ba959bbca8e49a4482198c1aa0a840f556f75c9e4aa494 31588 stressant_0.3.1.tar.xz
 f1c56c4d08e2ada7257b0985272c8c4c2bb2fccb38b93d8ef380e4312c7ec44b 6035 stressant_0.3.1_amd64.buildinfo
 4895d5198b87faf401bdd7d2e58d012909eec41c74e0acff3631b18d50dea927 19756 stressant_0.3.1_amd64.deb
Files:
 b43d3e8b21d5b753709bf0db41b6b601 1623 admin optional stressant_0.3.1.dsc
 c13f5d0057f6bf3dde2c6e09e9bc3ea6 31588 admin optional stressant_0.3.1.tar.xz
 02da95dd6076095b2155557eb99c4139 6035 admin optional stressant_0.3.1_amd64.buildinfo
 3d26f1f2ad19f17bb4900f03d74c0498 19756 admin optional stressant_0.3.1_amd64.deb

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEjckBzmQUbASK1Q+7eSFSUnt1kh4FAljMCbMACgkQeSFSUnt1
kh7kjRAAvJ4Jq+g2Ty7RCjPaRwe9fjQ1ObIWr+OezmgCiflJ+xK0fA148OW//8uu
prXd7yH7Zpl130Uqne+/eVywOBp/w47PPiQ78JucdjpH9QjO5+4dhchptYImAFZk
WTH3mqpZKL7jljPvvemTiEcsZ3MOegqwNN29AGd2E4tVl/nuXmCwadcfCgTjmSJR
iToxL0glTzsDr8cti7iOucgvMju4vzwnd9Lg5AycLQJNLI6IdpGXhGf5EP/z1JrV
CbhTz+pa9Nd/oe7QPD5KSXWeC1mLvY+1g8ujEJLjy4gK9bu9vEhyuONnT1CZLYkp
0L2PZKjUYUUuhAIM+qCgzSmKAJLIIRpKI477piEp50+nMOxtqdoAsvahHE0sDslq
7i0ei4nZXwREb4SwWH80WvCjXA9YWII4fORN/8NKct9++4mj29dyEz6TsWg10BBD
zjoFR+Pyeb0DSJJ3iO42qNojle0dPGEhbCXwX2k0P18YT0PeI4S6g1IWShb4COQY
oI4zcuv1qiSkxkFulo9SOZM0WdJFyqSRKbDs9W1kWBBunU93NH41MbetoCszUcP2
6dW0JzTKb7l008FXZhgzyrqYoSyYHQfrbHXutI2WuitaHfVa1HhL7pQ7wLGOyhpk
79CW9kZDG4BGlICA4pO1rv7nIJ0AVDghrCQbKnmbFb6VFGii37I=
=yMWl
-----END PGP SIGNATURE-----




Information forwarded to debian-bugs-dist@lists.debian.org, wnpp@debian.org:
Bug#707178; Package wnpp. (Sun, 19 Mar 2017 19:42:09 GMT) (full text, mbox, link).


Acknowledgement sent to Antoine Beaupre <anarcat@debian.org>:
Extra info received and forwarded to list. Copy sent to wnpp@debian.org. (Sun, 19 Mar 2017 19:42:09 GMT) (full text, mbox, link).


Message #84 received at 707178@bugs.debian.org (full text, mbox, reply):

From: Antoine Beaupre <anarcat@debian.org>
To: 707178@bugs.debian.org
Cc: Daniel Kahn Gillmor <dkg@fifthhorseman.net>, Matt Taggart <taggart@debian.org>
Subject: update on the stressant and breakin packages
Date: Sun, 19 Mar 2017 15:40:28 -0400
[Message part 1 (text/plain, inline)]
Hi all,

As those monitoring this bug report may have noticed, I have closed the
WNPP bug for the packaging of the "Breakin" tool into Debian.

In its place, I have uploaded the "Stressant" package which is a "simple
stress testing and burn-in tool". To quote the package description
further:

 stressant is designed to run on new machines to make sure they will
 work reliably by testing various parts of the system (CPU, RAM, disk,
 network) by putting them under heavy load and try to detect failures.

 As much as possible, stressant tries to reuse existing tools to perform
 the various tasks and aims to be run automatically.

Instead of trying to create another Debian Derivative (basically what
Breakin is doing and what Stressant *was* doing), I have shifted the
development focus into creating a wrapper script around basic
stress-testing software (currently fio, dd, hdparm, smartctl, stress-ng
and iperf3).

During its time deployed on Koumbits servers, the Stressant builds were
also extensively (and mostly) used to provide rescue tools over PXE
netboot environments, however. This part is now delegated to the
excellent Grml distribution which already deals with creating a live
recovery environment based on Debian. It is my hope that since stressant
entered Debian, it can be integrated into Grml as well and be part of
their regular build process.

Grml provides ISO images that can be burned on CD/DVD or copied over USB
sticks. It can be ran directly from RAM and/or in "forensics" (AKA
"read-only" mode) and can also boot off the network. I have engaged with
the Grml community in various forms to move the efforts from the
Stressant derivative into some form of upstream, which will be useful
for a larger community as well.

I still have to document how to switch an "old stressant" configuration
to the new system, but all the bits are there and migration should be
fairly smooth.

The source code is now available on Gitlab:

https://gitlab.com/anarcat/stressant

Details of the remaining issues for Grml integration are here:

https://gitlab.com/anarcat/stressant/blob/master/README.md#grml

More explanations on why Debirf was finally abandoned as a build tool
are here:

https://gitlab.com/anarcat/stressant/blob/master/README.md#debirf

I want to again express many thanks to the Debirf team for creating that
great project that was so useful in getting the first prototypes of
stressant up and running. Also thanks to the Grml team for providing
feedback and their openness in welcoming Stressant contributions so far.

I'd be happy to hear feedback from people that have expressed interest
in this tool. I'm developping this mostly on my free time now, but I
hope it will be useful in production for others and would be happy to
accompany organizations that wish to deploy this in the field and/or
customize it to fit their needs.

A.

-- 
Hasta la victoria siempre.
[signature.asc (application/pgp-signature, inline)]

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Mon, 17 Apr 2017 07:28:18 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Nov 21 23:45:50 2024; Machine Name: bembo

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.