Debian Bug report logs - #631102
xen: Xen guests on Squeeze lose networking randomly

version graph

Package: xen-hypervisor-4.0-amd64; Maintainer for xen-hypervisor-4.0-amd64 is (unknown);

Reported by: Kevin Bowling <kevin.bowling@kev009.com>

Date: Mon, 20 Jun 2011 09:51:08 UTC

Severity: important

Tags: moreinfo

Found in version xen/4.0.1-2

Done: Hans van Kranenburg <hans.van.kranenburg@mendix.com>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, kevin.bowling@kev009.com, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Mon, 20 Jun 2011 09:51:11 GMT) (full text, mbox, link).


Acknowledgement sent to Kevin Bowling <kevin.bowling@kev009.com>:
New Bug report received and forwarded. Copy sent to kevin.bowling@kev009.com, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Mon, 20 Jun 2011 09:51:13 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Kevin Bowling <kevin.bowling@kev009.com>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: xen: Xen guests on Squeeze lose networking randomly
Date: Mon, 20 Jun 2011 05:50:30 -0400
Package: xen-hypervisor-4.0-amd64
Version: 4.0.1-2
Severity: grave
File: xen
Justification: renders package unusable


Debian Squeeze Dom0, up to date.

Networking is handled by OS scripts, such that br0 and br1 are bridge interfaces.

Without warning, the DomU (Ubuntu 10.04 LTS) loses inbound connectivity.  It tends to happen after several hours.  It doesn't seem to be affected by throughput and is triggering on very small ammounts (5MB in and out over that time).  Having active traffic doesn't help, it still dies.

Updated the DomU to kernel 2.6.35 with no change.  I suspect the problem lies on the Dom0 side.

It's possible to "revive" the DomU for a short while by getting a console with xm on Dom0 and then sending pings to the Dom0 and other hosts.

-- System Information:
Debian Release: 6.0.1
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

xen-hypervisor-4.0-amd64 depends on no packages.

Versions of packages xen-hypervisor-4.0-amd64 recommends:
ii  xen-utils-4.0                 4.0.1-2    XEN administrative tools

Versions of packages xen-hypervisor-4.0-amd64 suggests:
pn  xen-docs-4.0                  <none>     (no description available)

-- no debconf information




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Mon, 20 Jun 2011 10:03:13 GMT) (full text, mbox, link).


Acknowledgement sent to Kevin Bowling <kevin.bowling@kev009.com>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Mon, 20 Jun 2011 10:03:17 GMT) (full text, mbox, link).


Message #10 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Kevin Bowling <kevin.bowling@kev009.com>
To: 631102@bugs.debian.org
Subject: Config Files
Date: Mon, 20 Jun 2011 02:58:00 -0700
/etc/xen/xend-config.sxp
https://gist.github.com/1035383

/etc/network/interfaces
https://gist.github.com/1035384

domU.cfg
https://gist.github.com/1035385




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Mon, 20 Jun 2011 10:18:17 GMT) (full text, mbox, link).


Acknowledgement sent to Thomas Goirand <zigo@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Mon, 20 Jun 2011 10:18:19 GMT) (full text, mbox, link).


Message #15 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Thomas Goirand <zigo@debian.org>
To: Kevin Bowling <kevin.bowling@kev009.com>, 631102@bugs.debian.org
Subject: Re: [Pkg-xen-devel] Bug#631102: xen: Xen guests on Squeeze lose networking randomly
Date: Mon, 20 Jun 2011 18:14:33 +0800
[Message part 1 (text/plain, inline)]
On 06/20/2011 05:50 PM, Kevin Bowling wrote:
> Debian Squeeze Dom0, up to date.
> 
> Networking is handled by OS scripts, such that br0 and br1 are bridge interfaces.
> 
> Without warning, the DomU (Ubuntu 10.04 LTS) loses inbound connectivity.  It tends to happen after several hours.  It doesn't seem to be affected by throughput and is triggering on very small ammounts (5MB in and out over that time).  Having active traffic doesn't help, it still dies.
> 
> Updated the DomU to kernel 2.6.35 with no change.  I suspect the problem lies on the Dom0 side.
> 
> It's possible to "revive" the DomU for a short while by getting a console with xm on Dom0 and then sending pings to the Dom0 and other hosts.

My guess is that what's happening isn't what you say. It might well be
due to your switch forgetting about the MAC address of your domU, if
there's no network activity on it (at least that's my guess, and it did
happen as well with Lenny and Debian as domU). I had the issue in many
data centers/switches, and I wrote a small python script to fix it in a
cron job. I have attached the script to this email.

As you can see, this script looks into /etc/xen/auto. So make sure that
you have symlink to configuration files in that folder.

Cheers,

Thomas
[dtc-xen-ping-all-ips (text/plain, attachment)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Mon, 20 Jun 2011 10:27:08 GMT) (full text, mbox, link).


Acknowledgement sent to Bastian Blank <waldi@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Mon, 20 Jun 2011 10:27:12 GMT) (full text, mbox, link).


Message #20 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Bastian Blank <waldi@debian.org>
To: Kevin Bowling <kevin.bowling@kev009.com>, 631102@bugs.debian.org
Subject: Re: [Pkg-xen-devel] Bug#631102: xen: Xen guests on Squeeze lose networking randomly
Date: Mon, 20 Jun 2011 12:24:30 +0200
severity 631102 important
tags 631102 moreinfo
thanks

On Mon, Jun 20, 2011 at 05:50:30AM -0400, Kevin Bowling wrote:
> Without warning, the DomU (Ubuntu 10.04 LTS) loses inbound connectivity.  It tends to happen after several hours.  It doesn't seem to be affected by throughput and is triggering on very small ammounts (5MB in and out over that time).  Having active traffic doesn't help, it still dies.

Does this happen with Debian Squeeze? Support for Ubuntu is not here.

> Updated the DomU to kernel 2.6.35 with no change.  I suspect the problem lies on the Dom0 side.

.35 is way to old. Update to .39 from Debian unstable if you want to
proof something.

Bastian

-- 
Vulcans believe peace should not depend on force.
		-- Amanda, "Journey to Babel", stardate 3842.3




Severity set to 'important' from 'grave' Request was from Bastian Blank <waldi@debian.org> to control@bugs.debian.org. (Mon, 20 Jun 2011 10:27:22 GMT) (full text, mbox, link).


Added tag(s) moreinfo. Request was from Bastian Blank <waldi@debian.org> to control@bugs.debian.org. (Mon, 20 Jun 2011 10:27:23 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Mon, 20 Jun 2011 18:48:03 GMT) (full text, mbox, link).


Acknowledgement sent to Kevin Bowling <kevin.bowling@kev009.com>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Mon, 20 Jun 2011 18:48:03 GMT) (full text, mbox, link).


Message #29 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Kevin Bowling <kevin.bowling@kev009.com>
To: Bastian Blank <waldi@debian.org>
Cc: 631102@bugs.debian.org
Subject: Re: [Pkg-xen-devel] Bug#631102: xen: Xen guests on Squeeze lose networking randomly
Date: Mon, 20 Jun 2011 11:45:07 -0700
On Mon, Jun 20, 2011 at 3:24 AM, Bastian Blank <waldi@debian.org> wrote:
>
> On Mon, Jun 20, 2011 at 05:50:30AM -0400, Kevin Bowling wrote:
>> Without warning, the DomU (Ubuntu 10.04 LTS) loses inbound connectivity.  It tends to happen after several hours.  It doesn't seem to be affected by throughput and is triggering on very small ammounts (5MB in and out over that time).  Having active traffic doesn't help, it still dies.
>
> Does this happen with Debian Squeeze? Support for Ubuntu is not here.

I'll make a concurrent Squeeze guest and see what happens.

>
>> Updated the DomU to kernel 2.6.35 with no change.  I suspect the problem lies on the Dom0 side.
>
> .35 is way to old. Update to .39 from Debian unstable if you want to
> proof something.

Is there a backport for the Dom0 kernel?  Xen bumps?




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Tue, 21 Jun 2011 00:30:03 GMT) (full text, mbox, link).


Acknowledgement sent to Kevin Bowling <kevin.bowling@kev009.com>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Tue, 21 Jun 2011 00:30:03 GMT) (full text, mbox, link).


Message #34 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Kevin Bowling <kevin.bowling@kev009.com>
To: Thomas Goirand <zigo@debian.org>
Cc: 631102@bugs.debian.org
Subject: Re: [Pkg-xen-devel] Bug#631102: xen: Xen guests on Squeeze lose networking randomly
Date: Mon, 20 Jun 2011 17:28:04 -0700
On Mon, Jun 20, 2011 at 3:14 AM, Thomas Goirand <zigo@debian.org> wrote:
> On 06/20/2011 05:50 PM, Kevin Bowling wrote:
>> Debian Squeeze Dom0, up to date.
>>
>> Networking is handled by OS scripts, such that br0 and br1 are bridge interfaces.
>>
>> Without warning, the DomU (Ubuntu 10.04 LTS) loses inbound connectivity.  It tends to happen after several hours.  It doesn't seem to be affected by throughput and is triggering on very small ammounts (5MB in and out over that time).  Having active traffic doesn't help, it still dies.
>>
>> Updated the DomU to kernel 2.6.35 with no change.  I suspect the problem lies on the Dom0 side.

Bumped to 2.6.38, no change.  Also have a Squeeze guest with same
symptoms.  It's definitely on the Dom0 side (kernel, xen, bridging).

>
> My guess is that what's happening isn't what you say. It might well be
> due to your switch forgetting about the MAC address of your domU, if
> there's no network activity on it (at least that's my guess, and it did
> happen as well with Lenny and Debian as domU). I had the issue in many
> data centers/switches, and I wrote a small python script to fix it in a
> cron job. I have attached the script to this email.

It doesn't appear so.  The switch is managed and I see the MAC on the
right port.  I've disabled MAC aging completely on it for further
testing.

arp shows the correct default gateway on the DomU.

The "forgetting" appears to be in the Dom0 networking stack.  Pinging
the DomU from the Dom0 and then pinging various hosts sometimes
revives it.  Except, packets are lost to those hosts that I tried
pinging earlier.

No such problems like this exist on the Dom0 to the outside world.

Maddening!  Any more ideas to try?




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Tue, 21 Jun 2011 19:15:03 GMT) (full text, mbox, link).


Acknowledgement sent to Thomas Goirand <thomas@goirand.fr>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Tue, 21 Jun 2011 19:15:03 GMT) (full text, mbox, link).


Message #39 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Thomas Goirand <thomas@goirand.fr>
To: Bastian Blank <waldi@debian.org>, 631102@bugs.debian.org
Cc: Kevin Bowling <kevin.bowling@kev009.com>
Subject: Re: [Pkg-xen-devel] Bug#631102: Bug#631102: xen: Xen guests on Squeeze lose networking randomly
Date: Wed, 22 Jun 2011 03:10:58 +0800
On 06/20/2011 06:24 PM, Bastian Blank wrote:
> severity 631102 important
> tags 631102 moreinfo
> thanks
> 
> On Mon, Jun 20, 2011 at 05:50:30AM -0400, Kevin Bowling wrote:
>> Without warning, the DomU (Ubuntu 10.04 LTS) loses inbound connectivity.  It tends to happen after several hours.  It doesn't seem to be affected by throughput and is triggering on very small ammounts (5MB in and out over that time).  Having active traffic doesn't help, it still dies.
> 
> Does this happen with Debian Squeeze? Support for Ubuntu is not here.

I noticed that issue in both Lenny and Squeeze yes, but since, I use my
ping cron script to mitigate the issue, because I thought it was a
switch issue...

Thomas




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Sun, 09 Dec 2012 23:15:06 GMT) (full text, mbox, link).


Acknowledgement sent to Kevin Bowling <kevin.bowling@kev009.com>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Sun, 09 Dec 2012 23:15:06 GMT) (full text, mbox, link).


Message #44 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Kevin Bowling <kevin.bowling@kev009.com>
To: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Cc: 631102@bugs.debian.org
Subject: Re: #631102 and #679533.. related?
Date: Sun, 9 Dec 2012 16:10:14 -0700
[Message part 1 (text/plain, inline)]
I skimmed through this but cannot say.  I no longer have access to the
hardware.  My solution was to switch the entire company to kvm.


On Sun, Dec 9, 2012 at 4:02 PM, Hans van Kranenburg <
hans.van.kranenburg@mendix.com> wrote:

> Kevin,
>
> Would you mind taking a look at #679533
> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679533) and give some
> feedback about how similar this issue is to your report in #631102 here?
>
> There was a hint 631102 might be similar, but from the text in this bug
> report it's not clear to me if this is about the same issue. Maybe you
> could comment on it.
>
> First results of a big shoot-out session (reproducing and isolating the
> bug) last week is that we found out disabling hyperthreading in the
> server bios made our issue go away. Knowing that... we now see, looking
> back, that after we switched on HT together with the lenny->squeeze (xen
> 3->4) upgrade, this issues started to pop up weeks later in production.
> The obscure bug did not bite during upgrade-testing. :|
>
> Thanks,
>
> --
> Hans van Kranenburg - System / Network Engineer
> T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com
>
[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Sun, 09 Dec 2012 23:15:08 GMT) (full text, mbox, link).


Acknowledgement sent to Hans van Kranenburg <hans.van.kranenburg@mendix.com>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Sun, 09 Dec 2012 23:15:08 GMT) (full text, mbox, link).


Message #49 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
To: 631102@bugs.debian.org, Kevin Bowling <kevin.bowling@kev009.com>
Subject: #631102 and #679533.. related?
Date: Mon, 10 Dec 2012 00:02:11 +0100
Kevin,

Would you mind taking a look at #679533
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=679533) and give some
feedback about how similar this issue is to your report in #631102 here?

There was a hint 631102 might be similar, but from the text in this bug
report it's not clear to me if this is about the same issue. Maybe you
could comment on it.

First results of a big shoot-out session (reproducing and isolating the
bug) last week is that we found out disabling hyperthreading in the
server bios made our issue go away. Knowing that... we now see, looking
back, that after we switched on HT together with the lenny->squeeze (xen
3->4) upgrade, this issues started to pop up weeks later in production.
The obscure bug did not bite during upgrade-testing. :|

Thanks,

-- 
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Sun, 09 Dec 2012 23:15:10 GMT) (full text, mbox, link).


Acknowledgement sent to Hans van Kranenburg <hans.van.kranenburg@mendix.com>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Sun, 09 Dec 2012 23:15:10 GMT) (full text, mbox, link).


Message #54 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
To: Kevin Bowling <kevin.bowling@kev009.com>
Cc: 631102@bugs.debian.org
Subject: Re: #631102 and #679533.. related?
Date: Mon, 10 Dec 2012 00:13:00 +0100
On 12/10/2012 12:10 AM, Kevin Bowling wrote:
> I skimmed through this but cannot say.  I no longer have access to the
> hardware.

Ok.

> My solution was to switch the entire company to kvm.

Haha :-)

You might consider closing the bug report although, if the issue is not
relevant any more.

Thanks,

-- 
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com



Information forwarded to debian-bugs-dist@lists.debian.org, Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>:
Bug#631102; Package xen-hypervisor-4.0-amd64. (Sun, 09 Dec 2012 23:21:03 GMT) (full text, mbox, link).


Acknowledgement sent to Kevin Bowling <kevin.bowling@kev009.com>:
Extra info received and forwarded to list. Copy sent to Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>. (Sun, 09 Dec 2012 23:21:03 GMT) (full text, mbox, link).


Message #59 received at 631102@bugs.debian.org (full text, mbox, reply):

From: Kevin Bowling <kevin.bowling@kev009.com>
To: 631102@bugs.debian.org
Cc: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Subject: Re: #631102 and #679533.. related?
Date: Sun, 9 Dec 2012 16:17:32 -0700
[Message part 1 (text/plain, inline)]
Bug 631102 I suggest closing as cannot reproduce if it has not been seen in
the wild.  I do not have access to the hardware anymore.


On Sun, Dec 9, 2012 at 4:13 PM, Hans van Kranenburg <
hans.van.kranenburg@mendix.com> wrote:

> On 12/10/2012 12:10 AM, Kevin Bowling wrote:
> > I skimmed through this but cannot say.  I no longer have access to the
> > hardware.
>
> Ok.
>
> > My solution was to switch the entire company to kvm.
>
> Haha :-)
>
> You might consider closing the bug report although, if the issue is not
> relevant any more.
>
> Thanks,
>
> --
> Hans van Kranenburg - System / Network Engineer
> T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com
>
[Message part 2 (text/html, inline)]

Reply sent to Hans van Kranenburg <hans.van.kranenburg@mendix.com>:
You have taken responsibility. (Sun, 09 Dec 2012 23:33:08 GMT) (full text, mbox, link).


Notification sent to Kevin Bowling <kevin.bowling@kev009.com>:
Bug acknowledged by developer. (Sun, 09 Dec 2012 23:33:08 GMT) (full text, mbox, link).


Message #64 received at 631102-done@bugs.debian.org (full text, mbox, reply):

From: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
To: Kevin Bowling <kevin.bowling@kev009.com>
Cc: 631102-done@bugs.debian.org
Subject: Re: #631102 and #679533.. related?
Date: Mon, 10 Dec 2012 00:24:02 +0100
Ok, closing now.

Hfgl, thanks,

On 12/10/2012 12:17 AM, Kevin Bowling wrote:
> Bug 631102 I suggest closing as cannot reproduce if it has not been seen in
> the wild.  I do not have access to the hardware anymore.
> 
> 
> On Sun, Dec 9, 2012 at 4:13 PM, Hans van Kranenburg <
> hans.van.kranenburg@mendix.com> wrote:
> 
>> On 12/10/2012 12:10 AM, Kevin Bowling wrote:
>>> I skimmed through this but cannot say.  I no longer have access to the
>>> hardware.
>>
>> Ok.
>>
>>> My solution was to switch the entire company to kvm.
>>
>> Haha :-)
>>
>> You might consider closing the bug report although, if the issue is not
>> relevant any more.
>>
>> Thanks,
>>
>> --
>> Hans van Kranenburg - System / Network Engineer
>> T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com
>>
> 


-- 
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenburg@mendix.com | www.mendix.com



Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Mon, 07 Jan 2013 07:25:52 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Jan 14 05:38:17 2018; Machine Name: buxtehude

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.