Debian Bug report logs - #358696
Avoiding halt and rebooting blindly is dangerous

version graph

Package: nut; Maintainer for nut is Arnaud Quette <aquette@debian.org>; Source for nut is src:nut.

Reported by: Henrique de Moraes Holschuh <hmh@debian.org>

Date: Fri, 24 Mar 2006 00:33:02 UTC

Severity: grave

Tags: patch

Found in versions nut/2.0.3-4, 2.0.4-1

Fixed in version nut/2.0.4-2

Done: Arnaud Quette <aquette@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Arnaud Quette <aquette@debian.org>:
Bug#358696; Package nut. Full text and rfc822 format available.

Acknowledgement sent to "Daniel Richard G." <skunk@iskunk.org>:
New Bug report received and forwarded. Copy sent to Arnaud Quette <aquette@debian.org>. Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: "Daniel Richard G." <skunk@iskunk.org>
To: submit@bugs.debian.org
Subject: nut: problem with init script "poweroff" command behavior
Date: Thu, 23 Mar 2006 19:05:04 -0500
[Message part 1 (text/plain, inline)]
Package: nut
Version: 2.0.3-4
Severity: important
Tags: patch

When the init script's "poweroff" command is called from
/etc/init.d/halt (via "ups-monitor poweroff"): If the UPS does not cut
power immediately, control will return to the halt script, which will
subsequently power off the system.

This can happen in a number of scenarios: the driver fails to transmit
the shutdown command, the UPS waits for a short while before actually
cutting the power (offdelay), line power returns and the UPS can't/won't
cut power, etc.

Of course, having the system switch itself off is bad, because then it
will not automatically turn back on when supplied with power again
(assuming the reasonable BIOS default of "On/Off state: Last state").

I am attaching a proposed patch that yields a better behavior, and
follows the advice given in the NUT docs (shutdown.txt): The "poweroff"
command invokes "upsdrvctl shutdown", and then whether or not that
succeeds, it waits for a configurable length of time (15 minutes seems
like a good default), and reboots. I made some minor changes to the 
terminal output, too, taking into account e.g. the large blurb of text 
produced by the upsdrvctl invocation.


P.S.: I think it would be helpful to add a note to nut's README.Debian
file reminding the user that the HALT variable (in /etc/default/halt)
must be set to "poweroff" in order for /etc/init.d/halt to invoke
/etc/init.d/ups-monitor, when using the default SHUTDOWNCMD of "shutdown
-h +0". I had set "HALT=halt", thinking that this was what I wanted, but
changed it back after examining the halt script.

P.S.2: The nut init script contains an instance of / \t/....
[nut.patch (text/plain, attachment)]

Reply sent to Arnaud Quette <aquette@debian.org>:
You have taken responsibility. Full text and rfc822 format available.

Notification sent to "Daniel Richard G." <skunk@iskunk.org>:
Bug acknowledged by developer. Full text and rfc822 format available.

Message #10 received at 358696-close@bugs.debian.org (full text, mbox):

From: Arnaud Quette <aquette@debian.org>
To: 358696-close@bugs.debian.org
Subject: Bug#358696: fixed in nut 2.0.4-1
Date: Fri, 28 Jul 2006 08:02:04 -0700
Source: nut
Source-Version: 2.0.4-1

We believe that the bug you reported is fixed in the latest version of
nut, which is due to be installed in the Debian FTP archive:

nut-cgi_2.0.4-1_i386.deb
  to pool/main/n/nut/nut-cgi_2.0.4-1_i386.deb
nut-dev_2.0.4-1_i386.deb
  to pool/main/n/nut/nut-dev_2.0.4-1_i386.deb
nut-snmp_2.0.4-1_i386.deb
  to pool/main/n/nut/nut-snmp_2.0.4-1_i386.deb
nut-usb_2.0.4-1_i386.deb
  to pool/main/n/nut/nut-usb_2.0.4-1_i386.deb
nut_2.0.4-1.diff.gz
  to pool/main/n/nut/nut_2.0.4-1.diff.gz
nut_2.0.4-1.dsc
  to pool/main/n/nut/nut_2.0.4-1.dsc
nut_2.0.4-1_i386.deb
  to pool/main/n/nut/nut_2.0.4-1_i386.deb
nut_2.0.4.orig.tar.gz
  to pool/main/n/nut/nut_2.0.4.orig.tar.gz



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 358696@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Arnaud Quette <aquette@debian.org> (supplier of updated nut package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Thu, 27 Jul 2006 15:20:46 +0200
Source: nut
Binary: nut nut-usb nut-dev nut-snmp nut-cgi
Architecture: source i386
Version: 2.0.4-1
Distribution: unstable
Urgency: low
Maintainer: Arnaud Quette <aquette@debian.org>
Changed-By: Arnaud Quette <aquette@debian.org>
Description: 
 nut        - The core system of the nut - Network UPS Tools
 nut-cgi    - A web interface sub system for the nut - Network UPS Tools
 nut-dev    - Development files for the nut - Network UPS Tools
 nut-snmp   - A meta SNMP Driver subsystem for the nut - Network UPS Tools
 nut-usb    - USB Drivers subsystem for the nut - Network UPS Tools
Closes: 332846 354305 358696 359769 359801 366738 378818 378970
Changes: 
 nut (2.0.4-1) unstable; urgency=low
 .
   * New upstream release
     - driver do not use /var anymore when called for shutdown/poweroff
     (closes: #332846)
     - fixes the newhidups crash upon device reconnexion (closes: #354305,
     #359769)
   * debian/nut.preinst: syntax enhancement to avoid issue when the nut user
     already exists (closes: #378970)
   * debian/nut-cgi.postinst: fix wrong permissions (closes: #378818)
   * debian/po/cs.po: update Czech translation of nut debconf messages (closes:
     #366738)
   * debian/nut.default, debian/nut.init: "poweroff" workaround to deal with
     BIOS default of "On/Off state: Last state" and system halting itself
     before the UPS cuts power (closes: #358696)
   * debian/nut-usbups.rules: fix the broken udev rules (closes: #359801)
Files: 
 986f3db074da21572565f0fe4429d8b1 769 admin optional nut_2.0.4-1.dsc
 59b6f3038a5ef64c584913e72bd850a2 699424 admin optional nut_2.0.4.orig.tar.gz
 c1fded83ceb817655bc9680264f085be 28536 admin optional nut_2.0.4-1.diff.gz
 093f28f82947d7f0872db4492d302159 1016308 admin optional nut_2.0.4-1_i386.deb
 6a8ff6f08a64ec18cedfd7037fda9bc7 100346 admin optional nut-cgi_2.0.4-1_i386.deb
 9f07952bc7174126b50348bb4cafbfd1 81590 admin optional nut-snmp_2.0.4-1_i386.deb
 fd2f7b24d079729e64b69f50136e936b 183184 admin optional nut-usb_2.0.4-1_i386.deb
 32d0dcfa3499b4f35d0e84bb144afc2f 86910 admin optional nut-dev_2.0.4-1_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQFEyiB822QUyiBN3xsRAmv4AJ4wm1EuAZKIbIhq4rcoCnK27B43egCgjHcJ
jkFLSJ+Fshglx1WjBD6gCOQ=
=BI5q
-----END PGP SIGNATURE-----




Information forwarded to debian-bugs-dist@lists.debian.org, Arnaud Quette <aquette@debian.org>:
Bug#358696; Package nut. Full text and rfc822 format available.

Acknowledgement sent to Henrique de Moraes Holschuh <hmh@debian.org>:
Extra info received and forwarded to list. Copy sent to Arnaud Quette <aquette@debian.org>. Full text and rfc822 format available.

Message #15 received at 358696@bugs.debian.org (full text, mbox):

From: Henrique de Moraes Holschuh <hmh@debian.org>
To: 358696@bugs.debian.org
Cc: "Daniel Richard G." <skunk@iskunk.org>
Subject: This is dangerous, please make it default to disabled
Date: Fri, 28 Jul 2006 14:46:47 -0300
Version: 2.0.4-1

reopen 358696 !
retitle 358696 Avoiding halt and rebooting blindly is dangerous
severity 358696 grave
found 2.0.4-1
thanks

The proposed solution endangers data and hardware, thus the grave severity.

1. The UPS may take more than 15 minutes to shutdown the load.  You cannot
assume things like this, and you will cause data loss if you get it wrong:
the power-off could come with the system fully online.

2. Not powering off the box by itself (read: allowing halt and the kernel to
do its job and cut power cleanly) means it will be subject to high
transients when the UPS shuts down the load.  This will, in turn, make it
worse for the other loads that have not been properly shut down.  It would
be a disaster in a server farm.

3. Non-controlled shutdowns are *very* bad for all hardware, including
desktop systems.  For starters, all disks will be subject to emergency head
unloads.  The halt utility does a lot of work-around on kenrel bugs to make
sure disks are parked, RAID arrays are in read-only mode or shutdown, etc
for a damn good reason.

4. It is very probable that in any non-home scenarios, an UPS will protect
more than one equipment.  In those scenarios, the UPS is configured to NOT
accept "immediate shutdown the load" command from any of the equipments,
just from the main controller host.  Nut is geared to work fine and
specifically support such configurations.  This has to be taken into
account.

Thus, implementing the work around proposed in this bug report as a default
behaviour is not acceptable.  Please revert the change, or make it optional,
and *not* enabled by default.   I would go even further and actively
discourage heavily the use of this option, as it can damage the hardware.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Bug reopened, originator set to Henrique de Moraes Holschuh <hmh@debian.org>. Request was from Henrique de Moraes Holschuh <hmh@debian.org> to control@bugs.debian.org. Full text and rfc822 format available.

Changed Bug title. Request was from Henrique de Moraes Holschuh <hmh@debian.org> to control@bugs.debian.org. Full text and rfc822 format available.

Severity set to `grave' from `important' Request was from Henrique de Moraes Holschuh <hmh@debian.org> to control@bugs.debian.org. Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Arnaud Quette <aquette@debian.org>:
Bug#358696; Package nut. Full text and rfc822 format available.

Acknowledgement sent to "Daniel Richard G." <skunk@iSKUNK.ORG>:
Extra info received and forwarded to list. Copy sent to Arnaud Quette <aquette@debian.org>. Full text and rfc822 format available.

Message #26 received at 358696@bugs.debian.org (full text, mbox):

From: "Daniel Richard G." <skunk@iSKUNK.ORG>
To: 358696@bugs.debian.org
Cc: Henrique de Moraes Holschuh <hmh@debian.org>
Subject: Re: This is dangerous, please make it default to disabled
Date: Fri, 28 Jul 2006 14:24:00 -0400
On Fri, 2006 Jul 28 14:46:47 -0300, Henrique de Moraes Holschuh wrote:
> 
> 1. The UPS may take more than 15 minutes to shutdown the load.  You cannot
> assume things like this, and you will cause data loss if you get it wrong:
> the power-off could come with the system fully online.

The time period should be configurable; I just suggested 15 minutes as a 
default. You could set a higher value, but the tradeoff is that if the 
power returns, the system is unavailable for that time period.

> 2. Not powering off the box by itself (read: allowing halt and the kernel to
> do its job and cut power cleanly) means it will be subject to high
> transients when the UPS shuts down the load.  This will, in turn, make it
> worse for the other loads that have not been properly shut down.  It would
> be a disaster in a server farm.

Please elaborate on how server equipment is subjected to a transient when a 
UPS cuts power to it. (If anything, the situation is much worse when it is 
powered back on.)

> 3. Non-controlled shutdowns are *very* bad for all hardware, including
> desktop systems.  For starters, all disks will be subject to emergency head
> unloads.  The halt utility does a lot of work-around on kenrel bugs to make
> sure disks are parked, RAID arrays are in read-only mode or shutdown, etc
> for a damn good reason.

All of which can be done (and already is, I believe). The only thing that 
the system is doing while waiting for poweroff is "sleep 15m; reboot"---no 
disks need to be spinning for that.

> 4. It is very probable that in any non-home scenarios, an UPS will protect
> more than one equipment.  In those scenarios, the UPS is configured to NOT
> accept "immediate shutdown the load" command from any of the equipments,
> just from the main controller host.  Nut is geared to work fine and
> specifically support such configurations.  This has to be taken into
> account.

Isn't this already the case for non-networked UPSes? When the interface is 
serial or USB, it can only be connected to (and controlled by) a single, 
master host.

> Thus, implementing the work around proposed in this bug report as a default
> behaviour is not acceptable.  Please revert the change, or make it optional,
> and *not* enabled by default.   I would go even further and actively
> discourage heavily the use of this option, as it can damage the hardware.

I think you'll take issue with the NUT documentation, then, as it 
specifically suggests this approach.


--Daniel



Bug marked as found in version 2.0.4-1. Request was from Henrique de Moraes Holschuh <hmh@debian.org> to control@bugs.debian.org. Full text and rfc822 format available.

Bug marked as not found in version 2.0.3-4. Request was from Henrique de Moraes Holschuh <hmh@debian.org> to control@bugs.debian.org. Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Arnaud Quette <aquette@debian.org>:
Bug#358696; Package nut. Full text and rfc822 format available.

Acknowledgement sent to Henrique de Moraes Holschuh <hmh@debian.org>:
Extra info received and forwarded to list. Copy sent to Arnaud Quette <aquette@debian.org>. Full text and rfc822 format available.

Message #35 received at 358696@bugs.debian.org (full text, mbox):

From: Henrique de Moraes Holschuh <hmh@debian.org>
To: "Daniel Richard G." <skunk@iSKUNK.ORG>
Cc: 358696@bugs.debian.org
Subject: Re: This is dangerous, please make it default to disabled
Date: Fri, 28 Jul 2006 16:12:35 -0300
On Fri, 28 Jul 2006, Daniel Richard G. wrote:
> The time period should be configurable; I just suggested 15 minutes as a 
> default. You could set a higher value, but the tradeoff is that if the 
> power returns, the system is unavailable for that time period.

There is no tradeoff without the hack, and the hack is only needed in
hardware unsuitable for UPS management.  Thus, it must be optional.  It is
dangerous to data and the hardware, so it should not be the default.

It is fairly simple, really, unless I missed something major (which is
always possible).

> > 2. Not powering off the box by itself (read: allowing halt and the kernel to
> > do its job and cut power cleanly) means it will be subject to high
> > transients when the UPS shuts down the load.  This will, in turn, make it
> > worse for the other loads that have not been properly shut down.  It would
> > be a disaster in a server farm.
> 
> Please elaborate on how server equipment is subjected to a transient when a 
> UPS cuts power to it. (If anything, the situation is much worse when it is 
> powered back on.)

You have transient responses to power cuts.  Watch in an osciloscope,
computer hardware is not a resistive load.

The situation is bad when everything powers up at the same time too, yes.
That's why it isn't all powered up at once in server rooms, blade
enclosures, etc.

> > 3. Non-controlled shutdowns are *very* bad for all hardware, including
> > desktop systems.  For starters, all disks will be subject to emergency head
> > unloads.  The halt utility does a lot of work-around on kenrel bugs to make
> > sure disks are parked, RAID arrays are in read-only mode or shutdown, etc
> > for a damn good reason.
> 
> All of which can be done (and already is, I believe). The only thing that 
> the system is doing while waiting for poweroff is "sleep 15m; reboot"---no 
> disks need to be spinning for that.

If you did not call halt, plus told the kernel to shutdown the devices, no,
it was *not* done.

And the kernel is the *only* thing that really knows how to properly
powerdown the devices.  Currently, we cannot ask it to do so from userspace
easily, and if we did, we could not access the disks anymore for example.

> Isn't this already the case for non-networked UPSes? When the interface is 
> serial or USB, it can only be connected to (and controlled by) a single, 
> master host.

The issue is how the initscript behaves if the NUT shutdown command doesn't
kill everything to kingdon come in 5 seconds.  In fact, a proper UPS is
going to be programmed to actually *delay* the powerdown load command for
enough time to allow the load to try to powerdown for real by itself.

> > Thus, implementing the work around proposed in this bug report as a default
> > behaviour is not acceptable.  Please revert the change, or make it optional,
> > and *not* enabled by default.   I would go even further and actively
> > discourage heavily the use of this option, as it can damage the hardware.
> 
> I think you'll take issue with the NUT documentation, then, as it 
> specifically suggests this approach.

I will.  But maybe, perchance, the NUT docs don't suggest you do it unless
you own hardware that cannot do it properly?  I didn't read it yet.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Information forwarded to debian-bugs-dist@lists.debian.org, Arnaud Quette <aquette@debian.org>:
Bug#358696; Package nut. Full text and rfc822 format available.

Acknowledgement sent to "Daniel Richard G." <skunk@iSKUNK.ORG>:
Extra info received and forwarded to list. Copy sent to Arnaud Quette <aquette@debian.org>. Full text and rfc822 format available.

Message #40 received at 358696@bugs.debian.org (full text, mbox):

From: "Daniel Richard G." <skunk@iSKUNK.ORG>
To: 358696@bugs.debian.org
Cc: Henrique de Moraes Holschuh <hmh@debian.org>
Subject: Re: This is dangerous, please make it default to disabled
Date: Fri, 28 Jul 2006 18:01:57 -0400
On Fri, 2006 Jul 28 16:12:35 -0300, Henrique de Moraes Holschuh wrote:
> 
> There is no tradeoff without the hack, and the hack is only needed in
> hardware unsuitable for UPS management.  Thus, it must be optional.  It is
> dangerous to data and the hardware, so it should not be the default.

Define "(un)suitable for UPS management." Does this definition include
most people's desktop systems?

> You have transient responses to power cuts.  Watch in an osciloscope,
> computer hardware is not a resistive load.

No, but any decent power supply will present a load pretty close to it, 
making such a transient negligible. (I know this to be the case in 
production server-room environments.) If someone's got a rack setup where a 
UPS power cutoff will fry everything, they've got a much bigger problem 
than what we're discussing here.

> The situation is bad when everything powers up at the same time too, yes.
> That's why it isn't all powered up at once in server rooms, blade
> enclosures, etc.

Yes. No problem with wanting staggered shutdown, when you have a large 
number of machines connected, but large numbers of machines connected are 
not exactly a typical scenario.

> > All of which can be done (and already is, I believe). The only thing that 
> > the system is doing while waiting for poweroff is "sleep 15m; reboot"---no 
> > disks need to be spinning for that.
> 
> If you did not call halt, plus told the kernel to shutdown the devices, no,
> it was *not* done.
> 
> And the kernel is the *only* thing that really knows how to properly
> powerdown the devices.  Currently, we cannot ask it to do so from userspace
> easily, and if we did, we could not access the disks anymore for example.

We have "hdparm -Y". We can't access the disk after that, but we shouldn't 
need to. What more shutdown magic do you need on a hard disk that is not 
spinning?

If you're talking about a flaky hardware RAID array where you can't stop 
the platters without it self-destructing, then fine. I recall that the 
scripts check for RAID, and behave differently in that case.

> The issue is how the initscript behaves if the NUT shutdown command doesn't
> kill everything to kingdon come in 5 seconds.  In fact, a proper UPS is
> going to be programmed to actually *delay* the powerdown load command for
> enough time to allow the load to try to powerdown for real by itself.

Assuming things are as I had in my patch, the idea is to have all machines 
connected to a given UPS configured with a similar wait-until-poweroff- 
else-reboot time (if they don't shutdown straightaway).

This approach is admittedly not the best one---ideally you'd have some sort 
of statically-linked "death watch" daemon that would do the same thing, but 
also monitor the UPS, and broadcast an "online" signal if the power 
returns. You'd no longer have to configure any wait-until-poweroff time, 
and the aforementioned tradeoff goes away. But this is a wishlist item.

Anyway, the disagreement comes down to this:

Me: Keep the system minimally running, so that it powers off when the UPS 
cuts the power, so that it will turn on again when the power returns, given 
the default behavior and limitations of PC hardware. Do sensible steps to 
avoid data loss (stop the disks, etc.). Have this be the default, as PC 
users are the common case.

You: Do a normal system shutdown. Rely on server-grade features (e.g. WOL 
packet from a networked UPS) to resume operation, or an "On/Off state: ON" 
BIOS setting (despite the problems associated with that). Have this be the 
default, as the risk of data loss from fragile storage media trumps that of 
system unavailability after an extended outage.

Mr. Quette will have to decide this, but I don't think you've made a strong 
case for a power-cut being significantly detrimental to data or hardware. 
Yes, there are circumstances where this can happen, but these are 
exceptions to the rule. And in one well-known case (RAID arrays), the 
scripts can easily do something different.

> > I think you'll take issue with the NUT documentation, then, as it 
> > specifically suggests this approach.
> 
> I will.  But maybe, perchance, the NUT docs don't suggest you do it unless
> you own hardware that cannot do it properly?  I didn't read it yet.

I'm getting the impression that "hardware that cannot do it properly," as 
you mean it, includes most PCs and non-server machines. Your view carries 
the day if NUT's userbase is not mostly these.


--Daniel



Information forwarded to debian-bugs-dist@lists.debian.org, Arnaud Quette <aquette@debian.org>:
Bug#358696; Package nut. Full text and rfc822 format available.

Acknowledgement sent to "Arnaud Quette" <aquette.dev@gmail.com>:
Extra info received and forwarded to list. Copy sent to Arnaud Quette <aquette@debian.org>. Full text and rfc822 format available.

Message #45 received at 358696@bugs.debian.org (full text, mbox):

From: "Arnaud Quette" <aquette.dev@gmail.com>
To: 358696@bugs.debian.org
Cc: "Daniel Richard G." <skunk@iskunk.org>, "Henrique de Moraes Holschuh" <hmh@debian.org>
Subject: Re: Bug#358696: This is dangerous, please make it default to disabled
Date: Mon, 31 Jul 2006 11:28:06 +0200
Hi fellows,

2006/7/29, Daniel Richard G. <skunk@iskunk.org>:
> On Fri, 2006 Jul 28 16:12:35 -0300, Henrique de Moraes Holschuh wrote:
> >...
>
> Anyway, the disagreement comes down to this:
>
> Me: Keep the system minimally running, so that it powers off when the UPS
> cuts the power, so that it will turn on again when the power returns, given
> the default behavior and limitations of PC hardware. Do sensible steps to
> avoid data loss (stop the disks, etc.). Have this be the default, as PC
> users are the common case.
>
> You: Do a normal system shutdown. Rely on server-grade features (e.g. WOL
> packet from a networked UPS) to resume operation, or an "On/Off state: ON"
> BIOS setting (despite the problems associated with that). Have this be the
> default, as the risk of data loss from fragile storage media trumps that of
> system unavailability after an extended outage.
>
> Mr. Quette will have to decide this, but I don't think you've made a strong
> case for a power-cut being significantly detrimental to data or hardware.
> Yes, there are circumstances where this can happen, but these are
> exceptions to the rule. And in one well-known case (RAID arrays), the
> scripts can easily do something different.
>
> > > I think you'll take issue with the NUT documentation, then, as it
> > > specifically suggests this approach.
> >
> > I will.  But maybe, perchance, the NUT docs don't suggest you do it unless
> > you own hardware that cannot do it properly?  I didn't read it yet.
>
> I'm getting the impression that "hardware that cannot do it properly," as
> you mean it, includes most PCs and non-server machines. Your view carries
> the day if NUT's userbase is not mostly these.

The point you're talking about is a long standing problem I haven't
yet found a *perfect* solution for.

As you have well stated both, hardware difference, the huge number of
UPSs setup and bios default configuration make it hard (or impossible)
to find The Solution.

Just to avoid misunderstanding: NUT relies by default upon hardware to
be halted, and (BIOS) configured to power on on AC restored.

I'll thus leave the patch, but disable it in -2 (scheduled for release
by tomorrow), referecing the present thread as a WARNING.
When I'll get more time (too busy for the moment with NUT bridging to
HAL, some major code rewrite and internal projects), I'll restart 2
sub project (NPS - NUT Packaging Standard, and QA - Quality Assurance:
https://alioth.debian.org/pm/?group_id=30602) and try to find The
Solution. While the former will focus on NUT integration (ie halt
procedure), the latter will focus on reliability of the UPS poweroff
and such things (like finding upstream workaround for dumb UPSs to
address power races).

Thank you both for your constructive feedback, and don't hesitate to
add more comments.

Arnaud
-- 
Linux / Unix Expert - MGE UPS SYSTEMS - R&D Dpt
Network UPS Tools (NUT) Project Leader - http://www.networkupstools.org/
Debian Developer - http://people.debian.org/~aquette/
OpenSource Developer - http://arnaud.quette.free.fr/



Information forwarded to debian-bugs-dist@lists.debian.org, Arnaud Quette <aquette@debian.org>:
Bug#358696; Package nut. Full text and rfc822 format available.

Acknowledgement sent to Henrique de Moraes Holschuh <hmh@debian.org>:
Extra info received and forwarded to list. Copy sent to Arnaud Quette <aquette@debian.org>. Full text and rfc822 format available.

Message #50 received at 358696@bugs.debian.org (full text, mbox):

From: Henrique de Moraes Holschuh <hmh@debian.org>
To: "Daniel Richard G." <skunk@iSKUNK.ORG>
Cc: 358696@bugs.debian.org
Subject: Re: This is dangerous, please make it default to disabled
Date: Mon, 31 Jul 2006 13:47:21 -0300
On Fri, 28 Jul 2006, Daniel Richard G. wrote:
> On Fri, 2006 Jul 28 16:12:35 -0300, Henrique de Moraes Holschuh wrote:
> > There is no tradeoff without the hack, and the hack is only needed in
> > hardware unsuitable for UPS management.  Thus, it must be optional.  It is
> > dangerous to data and the hardware, so it should not be the default.
> 
> Define "(un)suitable for UPS management." Does this definition include
> most people's desktop systems?

Suitable for UPS management:
	Load:
		Powers up when AC returns
		Can be informed that it must shutdown by the UPS
			(through NUT).
	UPS:
		Does delayed load shutdown upon shutdown command
		Does not power up the load before it has enough charge
			to do a delayed shutdown, plus safety margin.
		Always power-cycles the load after a shutdown command is
			ACK'ed to the controlling host.  Even if AC
			returns, and it doesn't need to shutdown anymore.
		Communicates the host when battery charge is below a
			certain threshold, so that it can shutdown safely.
		Powers up the load if the batteries have enough charge,
			and an AC cycle happens while the load is offline.
		Powers up the load after a timer expires, if no AC cycles
			happen AND the load was broght offline by an explicit
			delayed shutdown command.

Anything else is unsuitable.  Any PC97 desktop should be suitable for proper
UPS management.  And just FYI, PC97 requires WoL on all ethernet devices,
not that you need WoL for a proper UPS setup, but you somehow got the idea
that WoL was a server-grade feature...

> > You have transient responses to power cuts.  Watch in an osciloscope,
> > computer hardware is not a resistive load.
> 
> No, but any decent power supply will present a load pretty close to it, 

Only ones with PFC. 

> production server-room environments.) If someone's got a rack setup where a 
> UPS power cutoff will fry everything, they've got a much bigger problem 
> than what we're discussing here.

Yes.

> number of machines connected, but large numbers of machines connected are 
> not exactly a typical scenario.

No, but your hard-drive doing emergency unloads is a typical scenario, and
desktop HDs don't like those unloads *at* *all*.  Do not do it (and as I
already said, the only proper way to know the HD heads are unloaded requires
kernel cooperation, and it is NOT done by userspace currently). 

I know you were under the mistaken impression that we could guarantee all
HD heads were unloaded in userspace, and before halt runs.  We not only
cannot do it, we also do not *attempt* to do it.  The only thing in Debian
initscripts that really tries to take care of HD head unloads is the halt
command.

You can, of course, try to make sure hdparm was run and actually uloaded all
heads for your particular configuration, but it is not an acceptable
default, because we cannot get it right every time.  So implement it as an
admin-enabled, admin-configured option by all means.  But *not* as a
default.

> > > All of which can be done (and already is, I believe). The only thing that 
> > > the system is doing while waiting for poweroff is "sleep 15m; reboot"---no 
> > > disks need to be spinning for that.
> > 
> > If you did not call halt, plus told the kernel to shutdown the devices, no,
> > it was *not* done.
> > 
> > And the kernel is the *only* thing that really knows how to properly
> > powerdown the devices.  Currently, we cannot ask it to do so from userspace
> > easily, and if we did, we could not access the disks anymore for example.
> 
> We have "hdparm -Y". We can't access the disk after that, but we shouldn't 
> need to. What more shutdown magic do you need on a hard disk that is not 
> spinning?

None.  If the disk spun down, but hdparm doesn't work for all disks.  And we
cannot reliably spin down all disks and uload heads from userspace, for all
possible configurations.  Thus, anything that relies on this cannot be made
a default.

> If you're talking about a flaky hardware RAID array where you can't stop 

SCSI plus all software RAID arrays.

> > The issue is how the initscript behaves if the NUT shutdown command doesn't
> > kill everything to kingdon come in 5 seconds.  In fact, a proper UPS is
> > going to be programmed to actually *delay* the powerdown load command for
> > enough time to allow the load to try to powerdown for real by itself.
> 
> Assuming things are as I had in my patch, the idea is to have all machines 
> connected to a given UPS configured with a similar wait-until-poweroff- 
> else-reboot time (if they don't shutdown straightaway).

The bad thing in your patch is that the maintainer made it non-optional, and
the default.  I understand it will not be a default anymore, which is enough
for me.

> Anyway, the disagreement comes down to this:
> 
> Me: Keep the system minimally running, so that it powers off when the UPS 
> cuts the power, so that it will turn on again when the power returns, given 
> the default behavior and limitations of PC hardware. Do sensible steps to 
> avoid data loss (stop the disks, etc.). Have this be the default, as PC 
> users are the common case.
> 
> You: Do a normal system shutdown. Rely on server-grade features (e.g. WOL 

No.  Me:  make the whole behaviour you want *optional*, and not the default,
because it is dangerous and we don't have a lick of a chance of making it
safe for all setups.

> packet from a networked UPS) to resume operation, or an "On/Off state: ON"

No.  Rely on standard PC97 ACPI desktop BIOS option "always power on on AC
return", which is the correct way to deal with machines that need to restart
when an UPS powers it up again.

> BIOS setting (despite the problems associated with that). Have this be the 
> default, as the risk of data loss from fragile storage media trumps that of 
> system unavailability after an extended outage.

No.  This is a local decision done by a local admin.  It cannot be a default
setting for Debian.  The Debian default must be the *safest* choice we have.

> Mr. Quette will have to decide this, but I don't think you've made a strong 
> case for a power-cut being significantly detrimental to data or hardware. 

I have not seen you make a case at all for a *default* behaviour.  You don't
need it to be default, you just need it to exist.

> I'm getting the impression that "hardware that cannot do it properly," as 
> you mean it, includes most PCs and non-server machines. Your view carries 
> the day if NUT's userbase is not mostly these.

It has been at least five years since I've last seen a desktop PC that is
incapable of "always power on", with the exception of some laptops.   I am
not buying your assertion that most desktop PCs cannot do it properly, but
even if this were true, it would still be a dangerous, unacceptable default
behaviour for NUT to do what you proposed.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



Information forwarded to debian-bugs-dist@lists.debian.org, Arnaud Quette <aquette@debian.org>:
Bug#358696; Package nut. Full text and rfc822 format available.

Acknowledgement sent to "Daniel Richard G." <skunk@iSKUNK.ORG>:
Extra info received and forwarded to list. Copy sent to Arnaud Quette <aquette@debian.org>. Full text and rfc822 format available.

Message #55 received at 358696@bugs.debian.org (full text, mbox):

From: "Daniel Richard G." <skunk@iSKUNK.ORG>
To: 358696@bugs.debian.org
Subject: Re: This is dangerous, please make it default to disabled
Date: Mon, 31 Jul 2006 22:57:12 -0400
On Mon, 2006 Jul 31 13:47:21 -0300, Henrique de Moraes Holschuh wrote:
> > 
> > Define "(un)suitable for UPS management." Does this definition include
> > most people's desktop systems?
> 
> Suitable for UPS management:
> 	Load:
> 		Powers up when AC returns
> 		Can be informed that it must shutdown by the UPS
> 			(through NUT).

Okay, so pretty much anything that can run NUT. Nice.

> 	UPS:
> 		Does delayed load shutdown upon shutdown command
> 		Does not power up the load before it has enough charge
> 			to do a delayed shutdown, plus safety margin.

Pretty basic stuff, yes.

> 		Always power-cycles the load after a shutdown command is
> 			ACK'ed to the controlling host.  Even if AC
> 			returns, and it doesn't need to shutdown anymore.

Many low-end UPSes fail here. Power races would be an academic issue if not 
for this.

> 		Communicates the host when battery charge is below a
> 			certain threshold, so that it can shutdown safely.
> 		Powers up the load if the batteries have enough charge,
> 			and an AC cycle happens while the load is offline.
> 		Powers up the load after a timer expires, if no AC cycles
> 			happen AND the load was broght offline by an explicit
> 			delayed shutdown command.
> 
> Anything else is unsuitable.

Hi, I'm Bob, and I have an unsuitable UPS. Can I use it with Debian?

> Any PC97 desktop should be suitable for proper UPS management.  And just 
> FYI, PC97 requires WoL on all ethernet devices, not that you need WoL for 
> a proper UPS setup, but you somehow got the idea that WoL was a 
> server-grade feature...

Fair enough, but a UPS with an Ethernet port (and a means of configuring 
WoL) certainly is. If not in purpose, then in price.

> > No, but any decent power supply will present a load pretty close to it, 
> 
> Only ones with PFC. 

Decent power supplies have PFC.

> > number of machines connected, but large numbers of machines connected are 
> > not exactly a typical scenario.
> 
> No, but your hard-drive doing emergency unloads is a typical scenario, and
> desktop HDs don't like those unloads *at* *all*.  Do not do it (and as I
> already said, the only proper way to know the HD heads are unloaded requires
> kernel cooperation, and it is NOT done by userspace currently). 
> 
> I know you were under the mistaken impression that we could guarantee all
> HD heads were unloaded in userspace, and before halt runs.  We not only
> cannot do it, we also do not *attempt* to do it.  The only thing in Debian
> initscripts that really tries to take care of HD head unloads is the halt
> command.
> 
> You can, of course, try to make sure hdparm was run and actually uloaded all
> heads for your particular configuration, but it is not an acceptable
> default, because we cannot get it right every time.  So implement it as an
> admin-enabled, admin-configured option by all means.  But *not* as a
> default.

Perhaps the sleep-then-reboot loop belongs inside the halt command, then. 
At some point, there's going to be little difference between cutting power 
to the PSU, and having the PSU do a soft poweroff.

> > need to. What more shutdown magic do you need on a hard disk that is not 
> > spinning?
> 
> None.  If the disk spun down, but hdparm doesn't work for all disks.  And we
> cannot reliably spin down all disks and uload heads from userspace, for all
> possible configurations.  Thus, anything that relies on this cannot be made
> a default.
>
> > If you're talking about a flaky hardware RAID array where you can't stop 
> 
> SCSI plus all software RAID arrays.

I mean _after_ mdadm is stopped. Not that any distinction is currently made 
between RAID setups, flaky or otherwise.

> The bad thing in your patch is that the maintainer made it non-optional, and
> the default.  I understand it will not be a default anymore, which is enough
> for me.

I agree that having it non-optional is undesirable.

> > You: Do a normal system shutdown. Rely on server-grade features (e.g. WOL 
> 
> No.  Me:  make the whole behaviour you want *optional*, and not the default,
> because it is dangerous and we don't have a lick of a chance of making it
> safe for all setups.

> > packet from a networked UPS) to resume operation, or an "On/Off state: ON"
> 
> No.  Rely on standard PC97 ACPI desktop BIOS option "always power on on AC
> return", which is the correct way to deal with machines that need to restart
> when an UPS powers it up again.

Correct? The PC will then always turn on when the AC returns, e.g. when 
being plugged in, or after a power outage when it was off to begin with. 
The PSU's hard power switch isn't a solution, either, as it is often 
inconvenient/inaccessible and many newer consumer PSUs don't even have one.

The real solution is to have an on/off state bit that can be frobbed by the 
OS, but I'm not holding my breath on that one.

> > BIOS setting (despite the problems associated with that). Have this be the 
> > default, as the risk of data loss from fragile storage media trumps that of 
> > system unavailability after an extended outage.
> 
> No.  This is a local decision done by a local admin.  It cannot be a default
> setting for Debian.  The Debian default must be the *safest* choice we have.

It's the fsck-versus-fsck-y-on-boot debate all over again....

> > Mr. Quette will have to decide this, but I don't think you've made a strong 
> > case for a power-cut being significantly detrimental to data or hardware. 
> 
> I have not seen you make a case at all for a *default* behaviour.  You don't
> need it to be default, you just need it to exist.

My case for the default is based on the notion that most people will be 
running standard systems without all the weirdness you're worried about, 
*and* that the danger you're positing is no worse than any other 
misconfiguration of NUT that causes a shutdown not to occur. If someone 
with a large, fragile RAID installs NUT, doesn't review the configuration, 
and expects his data to survive... well, then, what can you expect? We're 
talking about cutting power here, not twiddling partition tables.

Anyway, I care more about having the option, than having it be the default.

> It has been at least five years since I've last seen a desktop PC that is
> incapable of "always power on", with the exception of some laptops.   I am
> not buying your assertion that most desktop PCs cannot do it properly, but
> even if this were true, it would still be a dangerous, unacceptable default
> behaviour for NUT to do what you proposed.

PCs can do "always power on" just fine; the problem is that "always" really 
does mean "always." The behavior is not reasonable.


--Daniel



Reply sent to Arnaud Quette <aquette@debian.org>:
You have taken responsibility. Full text and rfc822 format available.

Notification sent to Henrique de Moraes Holschuh <hmh@debian.org>:
Bug acknowledged by developer. Full text and rfc822 format available.

Message #60 received at 358696-close@bugs.debian.org (full text, mbox):

From: Arnaud Quette <aquette@debian.org>
To: 358696-close@bugs.debian.org
Subject: Bug#358696: fixed in nut 2.0.4-2
Date: Tue, 01 Aug 2006 01:02:15 -0700
Source: nut
Source-Version: 2.0.4-2

We believe that the bug you reported is fixed in the latest version of
nut, which is due to be installed in the Debian FTP archive:

nut-cgi_2.0.4-2_i386.deb
  to pool/main/n/nut/nut-cgi_2.0.4-2_i386.deb
nut-dev_2.0.4-2_i386.deb
  to pool/main/n/nut/nut-dev_2.0.4-2_i386.deb
nut-snmp_2.0.4-2_i386.deb
  to pool/main/n/nut/nut-snmp_2.0.4-2_i386.deb
nut-usb_2.0.4-2_i386.deb
  to pool/main/n/nut/nut-usb_2.0.4-2_i386.deb
nut_2.0.4-2.diff.gz
  to pool/main/n/nut/nut_2.0.4-2.diff.gz
nut_2.0.4-2.dsc
  to pool/main/n/nut/nut_2.0.4-2.dsc
nut_2.0.4-2_i386.deb
  to pool/main/n/nut/nut_2.0.4-2_i386.deb



A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 358696@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Arnaud Quette <aquette@debian.org> (supplier of updated nut package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Tue, 01 Aug 2006 08:50:26 +0200
Source: nut
Binary: nut nut-usb nut-dev nut-snmp nut-cgi
Architecture: source i386
Version: 2.0.4-2
Distribution: unstable
Urgency: low
Maintainer: Arnaud Quette <aquette@debian.org>
Changed-By: Arnaud Quette <aquette@debian.org>
Description: 
 nut        - The core system of the nut - Network UPS Tools
 nut-cgi    - A web interface sub system for the nut - Network UPS Tools
 nut-dev    - Development files for the nut - Network UPS Tools
 nut-snmp   - A meta SNMP Driver subsystem for the nut - Network UPS Tools
 nut-usb    - USB Drivers subsystem for the nut - Network UPS Tools
Closes: 358696
Changes: 
 nut (2.0.4-2) unstable; urgency=low
 .
   * debian/rules: replace $PWD by CURDIR to satisfy buildd
   * debian/nut.default, debian/nut.init: make the bug 358696 workaround
     optional as it might be dangerous under some circumstances (closes:
     #358696)
Files: 
 f4f18f81d20efc9ecc12a4332153b814 769 admin optional nut_2.0.4-2.dsc
 d26dd1db440cee4459510e62d2ff1d2a 28747 admin optional nut_2.0.4-2.diff.gz
 df6f3ff6e8f71b5bf0e290c5a1c17496 1016732 admin optional nut_2.0.4-2_i386.deb
 36ab7a93a94c2b593a77326b731aaf35 100452 admin optional nut-cgi_2.0.4-2_i386.deb
 775adacc7e5350704759fdeacde41613 81702 admin optional nut-snmp_2.0.4-2_i386.deb
 3eab3ba6e7e096172ab6092f6998e628 183294 admin optional nut-usb_2.0.4-2_i386.deb
 bf4cd05b34bc1a7fa93572e6261aaf73 87006 admin optional nut-dev_2.0.4-2_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQFEzwft22QUyiBN3xsRAkAaAJoCajWtVuGt7bUwdb/12Q/E+68VIQCeNysY
81FOoMDwQD0TOOmuWceas8c=
=EFMv
-----END PGP SIGNATURE-----




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Wed, 27 Jun 2007 07:21:02 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Thu Apr 17 11:30:01 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.