Debian Bug report logs - #599161
xen-linux-system-2.6.32-5-xen-amd64: Clock moved forward 50 minutes, caused Xen HVM domU restart

Package: src:xen; Maintainer for src:xen is Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>;

Reported by: Mark Adams <mark@campbell-lange.net>

Date: Tue, 5 Oct 2010 08:33:02 UTC

Severity: important

Tags: fixed-upstream, patch

Merged with 674907

Found in version xen/4.0.1-5.5

Fixed in versions xen/4.1.3-7, xen/4.0.1-5.6

Done: Thomas Goirand <zigo@debian.org>

Bug is archived. No further changes may be made.

Display info messages

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Tue, 05 Oct 2010 08:33:04 GMT) (full text, mbox, link).

Acknowledgement sent to Mark Adams <mark@campbell-lange.net>:
New Bug report received and forwarded. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 05 Oct 2010 08:33:05 GMT) (full text, mbox, link).

Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

Package: xen-linux-system-2.6.32-5-xen-amd64
Version: 2.6.32-21
Severity: important


Hi, Did you receive this bug report? I hadn't received a bug ID even
though receiving the copy of the original report. Likely because the
address it was sent from originally was invalid.

-----------------

Hi All,                                                                                                                                           
                                                                                                                                                  
Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.                                                                              
Today I noticed (when kerberos to the domain controllers stopped                                                                                  
working..) that the clock was 50 minutes out in dom0 -- This caused the                                                                           
HVM windows domain controllers to have the wrong time.                                                                                            
                                                                                                                                                  
I'm not sure if this is a kernel issue or a xen issue, but the only                                                                               
thing related is I can see the following in the kernel log:                                                                                       
                                                                                                                                                  
Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc unstable (delta = -2999660303788 ns)                                             
                                                                                                                                                  
But I also see in the dmesg log that xen is using it's own clock.                                                                                 
                                                                                                                                                  
[    7.676563] Switching to clocksource xen                                                                                                       
                                                                                                                                                  
I can't identify anything else in the logs to indicate when the time                                                                              
might have changed. I have a few other dom0 at the same level that                                                                                
haven't decided to change the time.                                                                                                               
                                                                                                                                                  
Can anyone confirm whether xen controls the time or the kernel? Also                                                                              
when I corrected the time in dom0 it was still wrong in HVM domU -- How                                                                           
long does it take for this to propogate? (I rebooted the VM's to correct                                                                          
it immediately).                                                                                                                                  

It appears the HVM domU (windows server 2008)
unexpectedly shut down at 18:51, after the unstable clocksource error.
qemu-dm logs show a reset "reset requested in cpu_handle_ioreq." and
xend.log shows a reboot 
                                                                                                                                                  
Any other pointers on how to ensure stability of clocks from dom0 to                                                                              
domU HVM hosts (and pv for that matter..) would be appreciated. NTP was running
but also appeared to have crashed at the same time.                                                                                   
                                                                                                                                                  
Cheers,                                                                                                                                           
Mark                                                                                                                                              

-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages xen-linux-system-2.6.32-5-xen-amd64 depends on:
ii  linux-image-2.6.32-5-xen-amd 2.6.32-21   Linux 2.6.32 for 64-bit PCs, Xen d
ii  xen-hypervisor-3.2-1-amd64 [ 3.2.1-2     The Xen Hypervisor on AMD64
ii  xen-hypervisor-4.0-amd64 [xe 4.0.1~rc6-1 The Xen Hypervisor on AMD64

xen-linux-system-2.6.32-5-xen-amd64 recommends no packages.

xen-linux-system-2.6.32-5-xen-amd64 suggests no packages.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Wed, 06 Oct 2010 02:39:03 GMT) (full text, mbox, link).

Acknowledgement sent to Ben Hutchings <ben@decadent.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 06 Oct 2010 02:39:03 GMT) (full text, mbox, link).

Message #10 received at 599161@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Tue, 2010-10-05 at 09:31 +0100, Mark Adams wrote:
> Package: xen-linux-system-2.6.32-5-xen-amd64
> Version: 2.6.32-21
> Severity: important
> 
> 
> Hi, Did you receive this bug report? I hadn't received a bug ID even
> though receiving the copy of the original report.

We don't have any other bug report with this description.

> Likely because the address it was sent from originally was invalid.

That would explain it.

> -----------------
> 
> Hi All,                                                                                                                                           
>                                                                                                                                                   
> Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.                                                                              
> Today I noticed (when kerberos to the domain controllers stopped                                                                                  
> working..) that the clock was 50 minutes out in dom0 -- This caused the                                                                           
> HVM windows domain controllers to have the wrong time.                                                                                            

Since you appear to be in the UK, is it possible that the real-time
clock is set to local time (GMT+1) while Xen expects it to be GMT, or
vice versa?

(This doesn't explain why it's 50 minutes out rather than 1 hour.  But
ntpd will refuse to correct a large difference and the local clock may
then drift further.)

[...]
> Can anyone confirm whether xen controls the time or the kernel? Also                                                                              
> when I corrected the time in dom0 it was still wrong in HVM domU -- How                                                                           
> long does it take for this to propogate? (I rebooted the VM's to correct                                                                          
> it immediately).                                                                                                                                  
[...]

For HVM guests, the hypervisor emulates a standard PC real-time clock
and the guest uses that to initialise the system time, but there is no
way to force an update after the guest has booted unless the guest has
specific support for Xen; I assume Citrix does provide such software for
Windows but I don't know whether it is free software.

For PV guests, I assume you can force an update to the guest time using
the Xen management tools.

Note, I'm just a general kernel maintainer and don't have any great
knowledge of Xen.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Mark Adams <mark@campbell-lange.net>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 06 Oct 2010 09:18:08 GMT) (full text, mbox, link).

Message #15 received at 599161@bugs.debian.org (full text, mbox, reply):

Hi Ben, Thanks for your response. See my responses inline.

On Wed, Oct 06, 2010 at 03:35:23AM +0100, Ben Hutchings wrote:
> On Tue, 2010-10-05 at 09:31 +0100, Mark Adams wrote:
> > Package: xen-linux-system-2.6.32-5-xen-amd64
> > Version: 2.6.32-21
> > Severity: important
> > 
> > 
> > Hi, Did you receive this bug report? I hadn't received a bug ID even
> > though receiving the copy of the original report.
> 
> We don't have any other bug report with this description.
> 
> > Likely because the address it was sent from originally was invalid.
> 
> That would explain it.
> 
> > -----------------
> > 
> > Hi All,                                                                                                                                           
> >                                                                                                                                                   
> > Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.                                                                              
> > Today I noticed (when kerberos to the domain controllers stopped                                                                                  
> > working..) that the clock was 50 minutes out in dom0 -- This caused the                                                                           
> > HVM windows domain controllers to have the wrong time.                                                                                            
> 
> Since you appear to be in the UK, is it possible that the real-time
> clock is set to local time (GMT+1) while Xen expects it to be GMT, or
> vice versa?

The clock is set with tzdata as BST yes, it is also set to this in the
Windows server 2008 domU. We are using localtime=1 to match the clock
in dom0 to domU.

> 
> (This doesn't explain why it's 50 minutes out rather than 1 hour.  But
> ntpd will refuse to correct a large difference and the local clock may
> then drift further.)
> 
> [...]
> > Can anyone confirm whether xen controls the time or the kernel? Also                                                                              
> > when I corrected the time in dom0 it was still wrong in HVM domU -- How                                                                           
> > long does it take for this to propogate? (I rebooted the VM's to correct                                                                          
> > it immediately).                                                                                                                                  
> [...]
> 
> For HVM guests, the hypervisor emulates a standard PC real-time clock
> and the guest uses that to initialise the system time, but there is no
> way to force an update after the guest has booted unless the guest has
> specific support for Xen; I assume Citrix does provide such software for
> Windows but I don't know whether it is free software.

The citrix WHQL drivers might have this functionality, I don't use them
though - prefer the GPL PV drivers! (which don't have any clock support
as far as I can tell)

> 
> For PV guests, I assume you can force an update to the guest time using
> the Xen management tools.
> 
> Note, I'm just a general kernel maintainer and don't have any great
> knowledge of Xen.

All good, I have a feeling it might be a kernel issue rather than xen,
but I'm still not sure what actually -controls- the time, is it the
kernel? I think the key is in the log

Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc unstable (delta = -2999660303788 ns)
> 
As this is when the clock went from 18:00 to 18:50 and started the chain
of events (restarted the 2008 domU). Any ideas why this log occurred?

> Ben.

Regards,
Mark
> 
> -- 
> Ben Hutchings
> Once a job is fouled up, anything done to improve it makes it worse.

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 06 Oct 2010 10:51:03 GMT) (full text, mbox, link).

Message #20 received at 599161@bugs.debian.org (full text, mbox, reply):

On Wed, 2010-10-06 at 09:33 +0100, Mark Adams wrote:
> Hi Ben, Thanks for your response. See my responses inline.
> 
> On Wed, Oct 06, 2010 at 03:35:23AM +0100, Ben Hutchings wrote:
> > On Tue, 2010-10-05 at 09:31 +0100, Mark Adams wrote:
> > > Package: xen-linux-system-2.6.32-5-xen-amd64
> > > Version: 2.6.32-21
> > > Severity: important
> > > 
> > > 
> > > Hi, Did you receive this bug report? I hadn't received a bug ID even
> > > though receiving the copy of the original report.
> > 
> > We don't have any other bug report with this description.
> > 
> > > Likely because the address it was sent from originally was invalid.
> > 
> > That would explain it.
> > 
> > > -----------------
> > > 
> > > Hi All,                                                                                                                                           
> > >                                                                                                                                                   
> > > Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.                                                                              
> > > Today I noticed (when kerberos to the domain controllers stopped                                                                                  
> > > working..) that the clock was 50 minutes out in dom0 -- This caused the                                                                           
> > > HVM windows domain controllers to have the wrong time.                                                                                            
> > 
> > Since you appear to be in the UK, is it possible that the real-time
> > clock is set to local time (GMT+1) while Xen expects it to be GMT, or
> > vice versa?
> 
> The clock is set with tzdata as BST yes, it is also set to this in the
> Windows server 2008 domU. We are using localtime=1 to match the clock
> in dom0 to domU.
> 
> > 
> > (This doesn't explain why it's 50 minutes out rather than 1 hour.  But
> > ntpd will refuse to correct a large difference and the local clock may
> > then drift further.)
> > 
> > [...]
> > > Can anyone confirm whether xen controls the time or the kernel? Also                                                                              
> > > when I corrected the time in dom0 it was still wrong in HVM domU -- How                                                                           
> > > long does it take for this to propogate? (I rebooted the VM's to correct                                                                          
> > > it immediately).                                                                                                                                  
> > [...]
> > 
> > For HVM guests, the hypervisor emulates a standard PC real-time clock
> > and the guest uses that to initialise the system time, but there is no
> > way to force an update after the guest has booted unless the guest has
> > specific support for Xen; I assume Citrix does provide such software for
> > Windows but I don't know whether it is free software.
> 
> The citrix WHQL drivers might have this functionality, I don't use them
> though - prefer the GPL PV drivers! (which don't have any clock support
> as far as I can tell)

I think you can run a regular NTP client (assuming one exists for
Windows) in an HVM guest to keep wallclock time in sync.

> > For PV guests, I assume you can force an update to the guest time using
> > the Xen management tools.
> > 
> > Note, I'm just a general kernel maintainer and don't have any great
> > knowledge of Xen.

I should know but time handling (particularly for HVM guests) is
something where I basically know enough to know that there is lots I
don't know ;-)

Mark, you may find you get better answers/support from the xen-users
mailing list, or failing that, xen-devel.

> All good, I have a feeling it might be a kernel issue rather than xen,
> but I'm still not sure what actually -controls- the time, is it the
> kernel? I think the key is in the log
> 
> Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc unstable (delta = -2999660303788 ns)
> > 
> As this is when the clock went from 18:00 to 18:50 and started the chain
> of events (restarted the 2008 domU). Any ideas why this log occurred?

The TSC appeared to go backwards by a fairly significant amount, which
has upset the kernel.

The behaviour of the virtual TSC as seen by an HVM guest is controlled
by a combination of the tsc_mode setting in your domain configuration
and, I think, by the features of your specific hardware (some have
constant rate TSC, others advertise varying levels of synchronisation
between cores etc). It's then up to the guest kernel whether it even
uses TSC as a timesource at all and how it handles instability etc and
how it derives other time sources (such as the wallclock time) from it.

Sorry this isn't more helpful, but as I say you will probably get better
answers on one of the Xen mailing lists.

Ian.

-- 
Ian Campbell
Current Noise: Trouble - Wickedness Of Man

 ok, I will not marry Jo-Con-El's cow.

Message #25 received at 599161@bugs.debian.org (full text, mbox, reply):

On Wed, Oct 06, 2010 at 11:47:05AM +0100, Ian Campbell wrote:
> On Wed, 2010-10-06 at 09:33 +0100, Mark Adams wrote:
> > Hi Ben, Thanks for your response. See my responses inline.
> > 
> > On Wed, Oct 06, 2010 at 03:35:23AM +0100, Ben Hutchings wrote:
> > > On Tue, 2010-10-05 at 09:31 +0100, Mark Adams wrote:
> > > > Package: xen-linux-system-2.6.32-5-xen-amd64
> > > > Version: 2.6.32-21
> > > > Severity: important
> > > > 
> > > > 
> > > > Hi, Did you receive this bug report? I hadn't received a bug ID even
> > > > though receiving the copy of the original report.
> > > 
> > > We don't have any other bug report with this description.
> > > 
> > > > Likely because the address it was sent from originally was invalid.
> > > 
> > > That would explain it.
> > > 
> > > > -----------------
> > > > 
> > > > Hi All,                                                                                                                                           
> > > >                                                                                                                                                   
> > > > Im running Xen 4.0.1-rc6 Debian squeeze with pvops 2.6.32-21 kernel.                                                                              
> > > > Today I noticed (when kerberos to the domain controllers stopped                                                                                  
> > > > working..) that the clock was 50 minutes out in dom0 -- This caused the                                                                           
> > > > HVM windows domain controllers to have the wrong time.                                                                                            
> > > 
> > > Since you appear to be in the UK, is it possible that the real-time
> > > clock is set to local time (GMT+1) while Xen expects it to be GMT, or
> > > vice versa?
> > 
> > The clock is set with tzdata as BST yes, it is also set to this in the
> > Windows server 2008 domU. We are using localtime=1 to match the clock
> > in dom0 to domU.
> > 
> > > 
> > > (This doesn't explain why it's 50 minutes out rather than 1 hour.  But
> > > ntpd will refuse to correct a large difference and the local clock may
> > > then drift further.)
> > > 
> > > [...]
> > > > Can anyone confirm whether xen controls the time or the kernel? Also                                                                              
> > > > when I corrected the time in dom0 it was still wrong in HVM domU -- How                                                                           
> > > > long does it take for this to propogate? (I rebooted the VM's to correct                                                                          
> > > > it immediately).                                                                                                                                  
> > > [...]
> > > 
> > > For HVM guests, the hypervisor emulates a standard PC real-time clock
> > > and the guest uses that to initialise the system time, but there is no
> > > way to force an update after the guest has booted unless the guest has
> > > specific support for Xen; I assume Citrix does provide such software for
> > > Windows but I don't know whether it is free software.
> > 
> > The citrix WHQL drivers might have this functionality, I don't use them
> > though - prefer the GPL PV drivers! (which don't have any clock support
> > as far as I can tell)
> 
> I think you can run a regular NTP client (assuming one exists for
> Windows) in an HVM guest to keep wallclock time in sync.
> 
> > > For PV guests, I assume you can force an update to the guest time using
> > > the Xen management tools.
> > > 
> > > Note, I'm just a general kernel maintainer and don't have any great
> > > knowledge of Xen.
> 
> I should know but time handling (particularly for HVM guests) is
> something where I basically know enough to know that there is lots I
> don't know ;-)
> 
> Mark, you may find you get better answers/support from the xen-users
> mailing list, or failing that, xen-devel.
> 
> > All good, I have a feeling it might be a kernel issue rather than xen,
> > but I'm still not sure what actually -controls- the time, is it the
> > kernel? I think the key is in the log
> > 
> > Oct  2 18:50:33 havhost1 kernel: [623480.977748] Clocksource tsc unstable (delta = -2999660303788 ns)
> > > 
> > As this is when the clock went from 18:00 to 18:50 and started the chain
> > of events (restarted the 2008 domU). Any ideas why this log occurred?
> 
> The TSC appeared to go backwards by a fairly significant amount, which
> has upset the kernel.
> 
> The behaviour of the virtual TSC as seen by an HVM guest is controlled
> by a combination of the tsc_mode setting in your domain configuration
> and, I think, by the features of your specific hardware (some have
> constant rate TSC, others advertise varying levels of synchronisation
> between cores etc). It's then up to the guest kernel whether it even
> uses TSC as a timesource at all and how it handles instability etc and
> how it derives other time sources (such as the wallclock time) from it.
> 
> Sorry this isn't more helpful, but as I say you will probably get better
> answers on one of the Xen mailing lists.

Hi Ian,

Thanks for your notes. I've already tried the xen-users list so I will
try xen-devel to see if I can glean any more information about my issue.
I will update the report here when I get any more information.

Thanks, 
Mark

> 
> Ian.
> 
> -- 
> Ian Campbell
> Current Noise: Trouble - Wickedness Of Man
> 
>  ok, I will not marry Jo-Con-El's cow.
>

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Sun, 26 Dec 2010 11:54:02 GMT) (full text, mbox, link).

Acknowledgement sent to Moritz Muehlenhoff <jmm@inutil.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Sun, 26 Dec 2010 11:54:03 GMT) (full text, mbox, link).

Message #30 received at 599161@bugs.debian.org (full text, mbox, reply):

On Wed, Oct 06, 2010 at 12:05:18PM +0100, Mark Adams wrote:

> > > As this is when the clock went from 18:00 to 18:50 and started the chain
> > > of events (restarted the 2008 domU). Any ideas why this log occurred?
> > 
> > The TSC appeared to go backwards by a fairly significant amount, which
> > has upset the kernel.
> > 
> > The behaviour of the virtual TSC as seen by an HVM guest is controlled
> > by a combination of the tsc_mode setting in your domain configuration
> > and, I think, by the features of your specific hardware (some have
> > constant rate TSC, others advertise varying levels of synchronisation
> > between cores etc). It's then up to the guest kernel whether it even
> > uses TSC as a timesource at all and how it handles instability etc and
> > how it derives other time sources (such as the wallclock time) from it.
> > 
> > Sorry this isn't more helpful, but as I say you will probably get better
> > answers on one of the Xen mailing lists.
> 
> Hi Ian,
> 
> Thanks for your notes. I've already tried the xen-users list so I will
> try xen-devel to see if I can glean any more information about my issue.
> I will update the report here when I get any more information.

Did you contact them? From what I read from the bug report so far, running
an NTP client in the HVM host should suffice.

Cheers,
        Moritz

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Tue, 28 Dec 2010 12:42:03 GMT) (full text, mbox, link).

Acknowledgement sent to Paweł Puterla <pputerla@ecard.pl>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 28 Dec 2010 12:42:03 GMT) (full text, mbox, link).

Message #35 received at 599161@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Hi

I only wanted (for now) to confirm that the bug exists. I have the same
kernel+hvm version as the reporter.
Today we just noticed exactly +2999sec (50min) time skew on our dom0.

The only kernel message I got was:

[3103167.615818] Clocksource tsc unstable (delta = -2999660326203 ns)


I will also try to look after the bug report on xen-* lists.
Just wanted to confirm Mark's report.


-- 
Paweł Puterla
Network and Systems Administrator
eCard S.A.
Office: Królewska 16 Street, 00-103 Warsaw, POLAND
Phone: +48 22 493 44 24
pputerla@ecard.pl <mailto:pputerla@ecard.pl>
/eCard S.A. Joint Stock Company, with its registered office in Gdańsk
(80-387 Gdańsk), Arkońska 11 Street, entered into the Register of
Enterpreneurs of the National Court Register by the District Court for
the City of Gdańsk, VII Commercial Division of the National Court
Register 0000042304, with the initial capital fully collected in the
amount of PLN 11 000 000,00 , NIP no: 521-31-03-040, with Management
Board consisting of: Ewa Bereśniewicz-Kozłowska - President of
Management Board , Tomasz Krasiński - Vice President of Management
Board, Alicja Kuran-Kawka - Member of Management Board. /

[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Tue, 04 Jan 2011 17:27:03 GMT) (full text, mbox, link).

Acknowledgement sent to Mark Adams <mark@campbell-lange.net>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 04 Jan 2011 17:27:03 GMT) (full text, mbox, link).

Message #40 received at 599161@bugs.debian.org (full text, mbox, reply):

Hi,

No, unfortunately not. I just suffered it again aswell over the
Christmas break. Let me know if you find any fix.

Regards,
Mark

On Sun, Dec 26, 2010 at 12:50:26PM +0100, Moritz Muehlenhoff wrote:
> On Wed, Oct 06, 2010 at 12:05:18PM +0100, Mark Adams wrote:
> 
> > > > As this is when the clock went from 18:00 to 18:50 and started the chain
> > > > of events (restarted the 2008 domU). Any ideas why this log occurred?
> > > 
> > > The TSC appeared to go backwards by a fairly significant amount, which
> > > has upset the kernel.
> > > 
> > > The behaviour of the virtual TSC as seen by an HVM guest is controlled
> > > by a combination of the tsc_mode setting in your domain configuration
> > > and, I think, by the features of your specific hardware (some have
> > > constant rate TSC, others advertise varying levels of synchronisation
> > > between cores etc). It's then up to the guest kernel whether it even
> > > uses TSC as a timesource at all and how it handles instability etc and
> > > how it derives other time sources (such as the wallclock time) from it.
> > > 
> > > Sorry this isn't more helpful, but as I say you will probably get better
> > > answers on one of the Xen mailing lists.
> > 
> > Hi Ian,
> > 
> > Thanks for your notes. I've already tried the xen-users list so I will
> > try xen-devel to see if I can glean any more information about my issue.
> > I will update the report here when I get any more information.
> 
> Did you contact them? From what I read from the bug report so far, running
> an NTP client in the HVM host should suffice.
> 
> Cheers,
>         Moritz

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Tue, 04 Jan 2011 17:27:07 GMT) (full text, mbox, link).

Acknowledgement sent to Mark Adams <mark@campbell-lange.net>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 04 Jan 2011 17:27:07 GMT) (full text, mbox, link).

Message #45 received at 599161@bugs.debian.org (full text, mbox, reply):

In addition, I received the below from "James Song" but when I queried
what it changed he did not respond, so I haven't tried it..

"added timer_mode =2  and tsc_mode = 1 and viridian=1 into your
configure file."                                                                         

On Tue, Jan 04, 2011 at 04:56:00PM +0000, Mark Adams wrote:
> Hi,
> 
> No, unfortunately not. I just suffered it again aswell over the
> Christmas break. Let me know if you find any fix.
> 
> Regards,
> Mark
> 
> On Sun, Dec 26, 2010 at 12:50:26PM +0100, Moritz Muehlenhoff wrote:
> > On Wed, Oct 06, 2010 at 12:05:18PM +0100, Mark Adams wrote:
> > 
> > > > > As this is when the clock went from 18:00 to 18:50 and started the chain
> > > > > of events (restarted the 2008 domU). Any ideas why this log occurred?
> > > > 
> > > > The TSC appeared to go backwards by a fairly significant amount, which
> > > > has upset the kernel.
> > > > 
> > > > The behaviour of the virtual TSC as seen by an HVM guest is controlled
> > > > by a combination of the tsc_mode setting in your domain configuration
> > > > and, I think, by the features of your specific hardware (some have
> > > > constant rate TSC, others advertise varying levels of synchronisation
> > > > between cores etc). It's then up to the guest kernel whether it even
> > > > uses TSC as a timesource at all and how it handles instability etc and
> > > > how it derives other time sources (such as the wallclock time) from it.
> > > > 
> > > > Sorry this isn't more helpful, but as I say you will probably get better
> > > > answers on one of the Xen mailing lists.
> > > 
> > > Hi Ian,
> > > 
> > > Thanks for your notes. I've already tried the xen-users list so I will
> > > try xen-devel to see if I can glean any more information about my issue.
> > > I will update the report here when I get any more information.
> > 
> > Did you contact them? From what I read from the bug report so far, running
> > an NTP client in the HVM host should suffice.
> > 
> > Cheers,
> >         Moritz
>

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Fri, 18 Feb 2011 15:57:09 GMT) (full text, mbox, link).

Acknowledgement sent to Olivier Hanesse <olivier.hanesse@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 18 Feb 2011 15:57:09 GMT) (full text, mbox, link).

Message #50 received at 599161@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Hello,

I am got this issue on several (10+) Xen 4.0 servers without HVM:

linux-image-2.6.32-bpo.5-xen-amd64          2.6.32-29~bpo50+1          Linux
2.6.32 for 64-bit PCs, Xen dom0 suppor
xen-hypervisor-4.0-amd64                    4.0.1-1                    The
Xen Hypervisor on AMD64

Each times there is this strange "50min" :

Nov  4 13:33:55  kernel: [591694.109052] Clocksource tsc unstable (delta =
-2999660340870 ns)
Nov 16 19:04:27 kernel: [102888.814677] Clocksource tsc unstable (delta =
-2999660352101 ns)
Nov 28 08:16:52 kernel: [2671840.236281] Clocksource tsc unstable (delta =
-2999660333313 ns)
Nov 29 08:59:54 kernel: [171581.195202] Clocksource tsc unstable (delta =
-2999660341178 ns)
Dec  8 12:58:09  kernel: [3108143.298526] Clocksource tsc unstable (delta =
-2999660353020 ns)
Dec 27 02:34:16 kernel: [1012551.748589] Clocksource tsc unstable (delta =
-2999660334211 ns)
Jan 12 09:12:53  kernel: [6537645.820016] Clocksource tsc unstable (delta =
-2999660339286 ns)
Jan 28 11:04:54  kernel: [5352834.035048] Clocksource tsc unstable (delta =
-2999660330184 ns)
Feb  9 21:04:03  kernel: [6415408.244988] Clocksource tsc unstable (delta =
-2999660342333 ns)
Feb 10 04:19:24 kernel: [4306193.416722] Clocksource tsc unstable (delta =
-2999660332510 ns)

after that ntp in Dom0 fails with : ntpd[6834]: time correction of -3000
seconds exceeds sanity limit (1000); set clock manually to the correct UTC
time
Same things with ntp in domU.

I didn't have this kind of issue with Xen Lenny.

Regards

Olivier

[Message part 2 (text/html, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Sun, 25 Sep 2011 13:21:17 GMT) (full text, mbox, link).

Acknowledgement sent to Markus Hochholdinger <Markus@hochholdinger.net>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Sun, 25 Sep 2011 13:21:17 GMT) (full text, mbox, link).

Message #55 received at 599161@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Hello,

I can confirm that there are problems with the clock in PV domUs.

I've setups like as follows:
* Debian lenny dom0s running Debian squeeze and Debian lenny domUs (PV)
* Debian squeeze doms running Debian squeeze and Debian lenny domUs (PV)

I only have clock/time problems with my squeeze domUs. They use 2.6.32-5-amd64 
and 2.6.32-5-686 kernels from Debian squeeze. It doesn't matter if they run on 
lenny (2.6.26-*-xen-686) or squeeze (2.6.32-*-xen-686) dom0s.

With my Debian lenny dom0s I've set clocksource=jiffies and my lenny domUs 
also have clocksource=jiffies and independent_wallclock set to 0. All is fine 
here. I've running ntpd on the dom0s and the domUs use the time from the 
dom0s. (I've tested to change the time in the dom0 and the time change also 
show up in the domUs.) so I've no Problems with lenny domUs and time.

With debian squeeze there's no clocksource jiffies anymore, so I use the 
default clocksource=xen in squeeze doms and squeeze domUs. Also there's no 
independent_wallclock anymore.

So here's the problem I see: The time in my squeeze domUs is running slightly 
slower than the time on my dom0. Changing the time on the dom0 doesn't change 
the time in the squeeze domUs. But I can run ntpdate/ntp in my squeeze domUs 
and time gets correct.

I'm wondering if I can set the former behavior of the domU clock to use the 
dom0s clock?


-- 
greetings

eMHa

[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Sun, 25 Sep 2011 18:27:33 GMT) (full text, mbox, link).

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Sun, 25 Sep 2011 18:27:33 GMT) (full text, mbox, link).

Message #60 received at 599161@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Sun, 2011-09-25 at 15:09 +0200, Markus Hochholdinger wrote:
> Hello,
> 
> I can confirm that there are problems with the clock in PV domUs.
> 
> I've setups like as follows:
> * Debian lenny dom0s running Debian squeeze and Debian lenny domUs (PV)
> * Debian squeeze doms running Debian squeeze and Debian lenny domUs (PV)
> 
> I only have clock/time problems with my squeeze domUs. They use 2.6.32-5-amd64 
> and 2.6.32-5-686 kernels from Debian squeeze. It doesn't matter if they run on 
> lenny (2.6.26-*-xen-686) or squeeze (2.6.32-*-xen-686) dom0s.
> 
> With my Debian lenny dom0s I've set clocksource=jiffies and my lenny domUs 
> also have clocksource=jiffies and independent_wallclock set to 0. All is fine 
> here. I've running ntpd on the dom0s and the domUs use the time from the 
> dom0s. (I've tested to change the time in the dom0 and the time change also 
> show up in the domUs.) so I've no Problems with lenny domUs and time.
> 
> With debian squeeze there's no clocksource jiffies anymore, so I use the 
> default clocksource=xen in squeeze doms and squeeze domUs. Also there's no 
> independent_wallclock anymore.
> 
> So here's the problem I see: The time in my squeeze domUs is running slightly 
> slower than the time on my dom0. Changing the time on the dom0 doesn't change 
> the time in the squeeze domUs. But I can run ntpdate/ntp in my squeeze domUs 
> and time gets correct.
> 
> I'm wondering if I can set the former behavior of the domU clock to use the 
> dom0s clock?

I'm afraid that the old "dependent" wallclock mode is not available with
the upstream kernel so no. The current recommendation is to run ntpd in
your guests.

Ian.

> 
> 

-- 
Ian Campbell


Our missions are peaceful -- not for conquest.  When we do battle, it
is only because we have no choice.
		-- Kirk, "The Squire of Gothos", stardate 2124.5

[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Sun, 02 Oct 2011 10:06:52 GMT) (full text, mbox, link).

Message #65 received at 599161@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Hello,

Am 25.09.2011 um 20:23 Uhr schrieb Ian Campbell <ijc@hellion.org.uk>:
> On Sun, 2011-09-25 at 15:09 +0200, Markus Hochholdinger wrote:
> > I can confirm that there are problems with the clock in PV domUs.
[..]
> > I'm wondering if I can set the former behavior of the domU clock to use
> > the dom0s clock?
> I'm afraid that the old "dependent" wallclock mode is not available with
> the upstream kernel so no. The current recommendation is to run ntpd in
> your guests.

many thanks for this information. I'm now going to install and configure ntpd 
in all my domUs with squeeze-Kernel.


-- 
greetings

eMHa

[signature.asc (application/pgp-signature, inline)]

Acknowledgement sent to Josip Rodin <joy@debbugs.entuzijast.net>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 28 Dec 2011 00:51:03 GMT) (full text, mbox, link).

Message #70 received at 599161@bugs.debian.org (full text, mbox, reply):

This clock jump by 2999 seconds also happened here, so per:

http://old-list-archives.xen.org/archives/html/xen-devel/2011-02/msg01557.html

we switched to clocksource=pit in /etc/default/grub's $GRUB_CMDLINE_XEN on
the dom0. This seemed to have avoided the problem, but since then, the clock
jumps started happening like this:

Dec 21 19:42:23 dom0machine kernel: [6034768.658836] Clocksource tsc unstable (delta = -811538856601 ns)

In addition, now I checked what the said machine thinks is its clocksource:

% cat /sys/devices/system/clocksource/clocksource0/current_clocksource /sys/devices/system/clocksource/clocksource0/available_clocksource
xen
xen

So there's neither pit nor tsc in the available list :)

-- 
     2. That which causes joy or happiness.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Tue, 03 Jan 2012 13:45:08 GMT) (full text, mbox, link).

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 03 Jan 2012 13:45:09 GMT) (full text, mbox, link).

Message #75 received at 599161@bugs.debian.org (full text, mbox, reply):

On Wed, 2011-12-28 at 01:49 +0100, Josip Rodin wrote:
> This clock jump by 2999 seconds also happened here, so per:
> 
> http://old-list-archives.xen.org/archives/html/xen-devel/2011-02/msg01557.html
> 
> we switched to clocksource=pit in /etc/default/grub's $GRUB_CMDLINE_XEN on
> the dom0. This seemed to have avoided the problem, but since then, the clock
> jumps started happening like this:
> 
> Dec 21 19:42:23 dom0machine kernel: [6034768.658836] Clocksource tsc unstable (delta = -811538856601 ns)
> 
> In addition, now I checked what the said machine thinks is its clocksource:
> 
> % cat /sys/devices/system/clocksource/clocksource0/current_clocksource /sys/devices/system/clocksource/clocksource0/available_clocksource
> xen
> xen
> 
> So there's neither pit nor tsc in the available list :)

A PV kernel will (or should) always use "xen" as it's clocksource. This
is a PV timesource based around the TSC + correction factors (to account
for drift and PCPU migration).

The clocksource=pit on the hypervisor command line controls the
hypervisor's own timesource and not the dom0 kernels. I'm not sure how
you query the hypervisor for its timesource but I guess it'll be in "xl
dmesg" somewhere ("Platform timer is ...").

The message you quote above says *tsc* unstable. Prior to that was the
system actually using the tsc clocksource? It really shouldn't have
been... Before that message did available_clocksource contain TSC? What
about current_clocksource? ("Before" here ~= on a freshly booted system)

What are your exact hypervisor and kernel command lines? Other than
clocksource=pit are you overriding anything else in this regard?

Can you press the 's' hypervisor debug key and report the resulting text
from dmesg. (press a debug key == "xl debug-key s" + "xl dmesg" or press
Ctrl-A 3 times on serial then press 's').

It seems odd that the only reports we see of this issue is with Debian
Squeeze. It's possible that the snapshot of pvops which made it into
squeeze had some issue but I've just looked over the diff between that
and the current xen 2.6.32 pvops kernel and don't see anything obviously
time related. Perhaps this is a bug in Xen 4.0.x rather than the kernel?

If someone who can reproduce could try (separately) a new kernel and new
hypervisor that might help narrow it down.

Another option instead of clocksource= might be to try tsc=[unstable|
skewed]. Quoth the comment:
        /*
         * tsc=unstable: Override all tests; assume TSC is unreliable.
         * tsc=skewed: Assume TSCs are individually reliable, but skewed across CPUs.
         */

Ian.
-- 
Ian Campbell
Current Noise: Today Is The Day - Pain Is A Warning

A good marriage would be between a blind wife and deaf husband.
		-- Michel de Montaigne

Message #80 received at 599161@bugs.debian.org (full text, mbox, reply):

On Tue, Jan 03, 2012 at 01:42:38PM +0000, Ian Campbell wrote:
> On Wed, 2011-12-28 at 01:49 +0100, Josip Rodin wrote:
> > This clock jump by 2999 seconds also happened here, so per:
> > 
> > http://old-list-archives.xen.org/archives/html/xen-devel/2011-02/msg01557.html
> > 
> > we switched to clocksource=pit in /etc/default/grub's $GRUB_CMDLINE_XEN on
> > the dom0. This seemed to have avoided the problem, but since then, the clock
> > jumps started happening like this:
> > 
> > Dec 21 19:42:23 dom0machine kernel: [6034768.658836] Clocksource tsc unstable (delta = -811538856601 ns)
> > 
> > In addition, now I checked what the said machine thinks is its clocksource:
> > 
> > % cat /sys/devices/system/clocksource/clocksource0/current_clocksource /sys/devices/system/clocksource/clocksource0/available_clocksource
> > xen
> > xen
> > 
> > So there's neither pit nor tsc in the available list :)
> 
> A PV kernel will (or should) always use "xen" as it's clocksource. This
> is a PV timesource based around the TSC + correction factors (to account
> for drift and PCPU migration).
> 
> The clocksource=pit on the hypervisor command line controls the
> hypervisor's own timesource and not the dom0 kernels. I'm not sure how
> you query the hypervisor for its timesource but I guess it'll be in "xl
> dmesg" somewhere ("Platform timer is ...").

Ah, d'oh :) sorry, I wasn't really thinking.

The xm dmesg output on HP DL360 machines that we have set to clocksource=pit
and that have nevertheless happened to shifted by more than 35996 seconds
in at least five incidents in the last six months says:

(XEN) Platform timer is 1.193MHz PIT

On a couple of FS RX300's that happened not to have clocksource=pit set but
had time shift by 2999.69 seconds it's this:

(XEN) Platform timer is 14.318MHz HPET

Both also show the following message after the time shift:

(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.


> The message you quote above says *tsc* unstable. Prior to that was the
> system actually using the tsc clocksource? It really shouldn't have
> been... Before that message did available_clocksource contain TSC? What
> about current_clocksource? ("Before" here ~= on a freshly booted system)

The dom0 machines where we set clocksource=pit do see the sole "xen"
clocksource. That didn't stop the time from going awry.

On the dom0 machines that don't have the hypervisor fixated on
clocksource=pit:

* one dom0 that sees both "xen" and "tsc" in available_clocksource, but uses
  "xen" as current_clocksource. Not sure what it used at the time of the
  failure in September, probably the same because we didn't touch that. 
* one that recently failed has:

% dmesg | grep unstable
[4613030.883101] Clocksource tsc unstable (delta = -2999660301416 ns)
% cat /sys/devices/system/clocksource/clocksource0/*
xen
xen

> What are your exact hypervisor and kernel command lines? Other than
> clocksource=pit are you overriding anything else in this regard?

Most of the machines now seem to have:

GRUB_CMDLINE_LINUX="console=tty0 console=ttyS1,115200n1 elevator=deadline"
GRUB_CMDLINE_XEN="dom0_mem=512M clocksource=pit cpuidle=0"

The machines without clocksource=pit only had dom0_mem=512M for the
hypervisor and nothing for the dom0 kernel.

> Can you press the 's' hypervisor debug key and report the resulting text
> from dmesg. (press a debug key == "xl debug-key s" + "xl dmesg" or press
> Ctrl-A 3 times on serial then press 's').

(Note that I used xm for both of those commands, I don't have xl.)

This is the output on a couple of of the DL360's with clocksource=pit:

(XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=3066 (count=1)
(XEN) dom2: mode=0,ofs=0x21e231c896,khz=2333479,inc=1,vtsc count: 10647611967 kernel, 454486411 user
(XEN) dom12: mode=0,ofs=0x21a01e68ddeb,khz=2333479,inc=1,vtsc count: 2478607037 kernel, 199833427 user
(XEN) dom17: mode=0,ofs=0x8d12c3820bf0b,khz=2333479,inc=1,vtsc count: 918220049 kernel, 56818086 user
(XEN) dom18: mode=0,ofs=0x8d1334e2f635f,khz=2333479,inc=1,vtsc count: 4707785417 kernel, 197043637 user
(XEN) dom21: mode=0,ofs=0x1004cc1e5bf801,khz=2333479,inc=1,vtsc count: 6386763431 kernel, 166512523 user
(XEN) dom22: mode=0,ofs=0x14b5955232a7e1,khz=2333479,inc=1,vtsc count: 2218555643 kernel, 88962103 user

(XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=1715 (count=1)
(XEN) dom1: mode=0,ofs=0x149170bd5f,khz=2333479,inc=1,vtsc count: 36234921552 kernel, 294922844 user

This is the output on an RX300 without clocksource=pit:

(XEN) TSC marked as reliable, warp = 0 (count=2)
(XEN) dom1: mode=0,ofs=0x59e046806,khz=2400116,inc=1
(XEN) No domains have emulated TSC

And finally this is the output on the odd machine that has tsc as an
available clock source:

(XEN) TSC marked as reliable, warp = 0 (count=2)
(XEN) dom1: mode=0,ofs=0x593b1f9e8,khz=2400190,inc=1
(XEN) dom4: mode=0,ofs=0xf3c77d49e41e6,khz=2400190,inc=1
(XEN) No domains have emulated TSC

In the latter case, I've no idea why the domU with the ID 4 would be using
a different clock source - we certainly didn't set it up in any such special
manner, it's been generated and booted like all others.
Within this domU machine, there's:

% cat /sys/devices/system/clocksource/clocksource0/*
xen tsc
xen

So it looks like we consistently use the xen clocksource.

> Another option instead of clocksource= might be to try tsc=[unstable|
> skewed]. Quoth the comment:
>         /*
>          * tsc=unstable: Override all tests; assume TSC is unreliable.
>          * tsc=skewed: Assume TSCs are individually reliable, but skewed across CPUs.
>          */

This is also for the hypervisor, right? 

In any case, I don't quite see what tsc=unstable would bring us - we see
problems both on cases where TSC is marked as reliable and as unreliable,
it's just a different shift value :)

-- 
     2. That which causes joy or happiness.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package xen-linux-system-2.6.32-5-xen-amd64. (Mon, 20 Feb 2012 09:03:03 GMT) (full text, mbox, link).

Acknowledgement sent to Dimitrij Hilt <dimitrij.hilt@fhe3.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 20 Feb 2012 09:03:05 GMT) (full text, mbox, link).

Message #85 received at 599161@bugs.debian.org (full text, mbox, reply):

Hi,

we got exactly same issue with two Dell PowerEdge R900. After running 
without any trouble for 50 resp. 60 days, Dom0 does jump several hours 
in a future.
Dom0 mean in "xm debug-key s; xm dmesg"
(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times.
(XEN) TSC has constant rate, deep Cstates possible, so not reliable, 
warp=15048 (count=1)
(XEN) dom1: mode=0,ofs=0x175a91cd54,khz=2925946,inc=1,vtsc count: 
164505657108 kernel, 22209746 user
(XEN) dom3: mode=0,ofs=0xb0d3e99f1f81,khz=2925946,inc=1,vtsc count: 
30339779336 kernel, 19604648 user
(XEN) dom4: mode=0,ofs=0xdc739842b4f4,khz=2925946,inc=1,vtsc count: 
5903404261 kernel, 29402760 user
(XEN) dom12: mode=0,ofs=0x12f1b269dbff1d,khz=2925946,inc=1,vtsc count: 
6914273 kernel, 78450 user

DomU on same Host:
kernel: : [5100661.449288] Clocksource tsc unstable (delta = 
-807453857134 ns)

We running hypervisor with boot line "/boot/xen-4.0-amd64.gz placeholder 
clocksource=pit cpuidle=0 dom0_mem=512M loglvl=all guest_loglvl=all"

As CPU we have 4x "X7350  @ 2.93GHz"

Any ideas how we can fix or avoid these jumps? Ist it an option to 
ignore Clock jumps in Dom0 in DomU (xen.independent_wallclock=1 was in 
Xen3 an Option).

We have no problem if Dom0 jumps, but for DomU it is catastrophically.

Regards,

Dimitrij

-- 
Dimitrij Hilt

dimitrij.hilt@fhe3.com
http://www.fhe3.com/

Bug reassigned from package 'xen-linux-system-2.6.32-5-xen-amd64' to 'src:linux-2.6'. Request was from Ben Hutchings <ben@decadent.org.uk> to control@bugs.debian.org. (Mon, 04 Jun 2012 06:06:33 GMT) (full text, mbox, link).

No longer marked as found in versions linux-2.6/2.6.32-21. Request was from Ben Hutchings <ben@decadent.org.uk> to control@bugs.debian.org. (Mon, 04 Jun 2012 06:06:34 GMT) (full text, mbox, link).

Marked as found in versions linux-2.6/2.6.32-21. Request was from Ben Hutchings <ben@decadent.org.uk> to control@bugs.debian.org. (Mon, 04 Jun 2012 06:06:35 GMT) (full text, mbox, link).

Marked as found in versions linux-2.6/2.6.32-21; no longer marked as found in versions linux-2.6/2.6.32-21. Request was from Ben Hutchings <ben@decadent.org.uk> to control@bugs.debian.org. (Mon, 04 Jun 2012 06:06:35 GMT) (full text, mbox, link).

Merged 599161 674907 Request was from Bastian Blank <waldi@debian.org> to control@bugs.debian.org. (Mon, 01 Oct 2012 14:09:07 GMT) (full text, mbox, link).

Severity set to 'grave' from 'important' Request was from Antoine Beaupré <anarcat@debian.org> to control@bugs.debian.org. (Mon, 01 Oct 2012 14:24:05 GMT) (full text, mbox, link).

Severity set to 'important' from 'grave' Request was from maximilian attems <maks@debian.org> to control@bugs.debian.org. (Mon, 01 Oct 2012 14:39:03 GMT) (full text, mbox, link).

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package src:linux-2.6. (Mon, 22 Oct 2012 08:09:03 GMT) (full text, mbox, link).

Acknowledgement sent to Valentin Vidić <Valentin.Vidic@CARNet.hr>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 22 Oct 2012 08:09:03 GMT) (full text, mbox, link).

Message #104 received at 599161@bugs.debian.org (full text, mbox, reply):

Here is a backtrace related to this problem, in case it helps:

[863910.147108] Clocksource tsc unstable (delta = -811538859723 ns)
[863910.149479] BUG: soft lockup - CPU#2 stuck for 32768s! [swapper:0]
[863910.149479] Modules linked in: hmac sha1_generic ipmi_devintf drbd lru_cache cn xen_evtchn xenfs ip6t_REJECT ipt_REJECT ip6t_LOG ipt_LOG xt_multiport xt_tcpudp xt_comment ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge bonding 8021q garp stp loop snd_pcm snd_timer radeon ttm snd drm_kms_helper soundcore drm snd_page_alloc i5000_edac psmouse i2c_algo_bit hpilo pcspkr ipmi_si edac_core evdev ipmi_msghandler rng_core serio_raw i2c_core i5k_amb hpwdt container shpchp pci_hotplug processor acpi_processor button ext3 jbd mbcache dm_mod usbhid hid sg sr_mod cdrom ata_generic uhci_hcd ata_piix ehci_hcd libata usbcore nls_base bnx2 hpsa cciss scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan]
[863910.149479] CPU 2:
[863910.149479] Modules linked in: hmac sha1_generic ipmi_devintf drbd lru_cache cn xen_evtchn xenfs ip6t_REJECT ipt_REJECT ip6t_LOG ipt_LOG xt_multiport xt_tcpudp xt_comment ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge bonding 8021q garp stp loop snd_pcm snd_timer radeon ttm snd drm_kms_helper soundcore drm snd_page_alloc i5000_edac psmouse i2c_algo_bit hpilo pcspkr ipmi_si edac_core evdev ipmi_msghandler rng_core serio_raw i2c_core i5k_amb hpwdt container shpchp pci_hotplug processor acpi_processor button ext3 jbd mbcache dm_mod usbhid hid sg sr_mod cdrom ata_generic uhci_hcd ata_piix ehci_hcd libata usbcore nls_base bnx2 hpsa cciss scsi_mod thermal thermal_sys [last unloaded: scsi_wait_scan]
[863910.149479] Pid: 0, comm: swapper Not tainted 2.6.32-5-xen-amd64 #1 ProLiant DL360 G5
[863910.149479] RIP: e030:[<ffffffff8100922a>]  [<ffffffff8100922a>] hypercall_page+0x22a/0x1001
[863910.149479] RSP: e02b:ffff8800034f4de8  EFLAGS: 00000246
[863910.149479] RAX: 0000000000040000 RBX: ffff8800034f4ea0 RCX: ffffffff8100922a
[863910.149479] RDX: ffff8800034f4ea0 RSI: 0000000000000000 RDI: 0000000000000000
[863910.149479] RBP: ffff88001fd88000 R08: 00000000000000d9 R09: 00000000000000fa
[863910.149479] R10: ffff88001fd59fd8 R11: 0000000000000246 R12: ffff8800023128f8
[863910.149479] R13: ffffffff812cda42 R14: 0000000000000100 R15: ffff88001fd59fd8
[863910.149479] FS:  00007f75890ff700(0000) GS:ffff8800034f1000(0000) knlGS:0000000000000000
[863910.149479] CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
[863910.149479] CR2: 00007f7588cb45f0 CR3: 0000000001001000 CR4: 0000000000002660
[863910.149479] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[863910.149479] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[863910.149479] Call Trace:
[863910.149479]  <IRQ>  [<ffffffff8100e635>] ? xen_force_evtchn_callback+0x9/0xa
[863910.149479]  [<ffffffff8100ecf2>] ? check_events+0x12/0x20
[863910.149479]  [<ffffffff8100ec99>] ? xen_irq_enable_direct_end+0x0/0x7
[863910.149479]  [<ffffffff8105b6e0>] ? run_timer_softirq+0x196/0x268
[863910.149479]  [<ffffffff81054dbf>] ? __do_softirq+0xdd/0x1a6
[863910.149479]  [<ffffffff811f2b93>] ? __xen_evtchn_do_upcall+0x245/0x28d
[863910.149479]  [<ffffffff81012cac>] ? call_softirq+0x1c/0x30
[863910.149479]  [<ffffffff8101422b>] ? do_softirq+0x3f/0x7c
[863910.149479]  [<ffffffff81054c2f>] ? irq_exit+0x36/0x76
[863910.149479]  [<ffffffff811f3384>] ? xen_evtchn_do_upcall+0x33/0x42
[863910.149479]  [<ffffffff81012cfe>] ? xen_do_hypervisor_callback+0x1e/0x30
[863910.149479]  <EOI>  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
[863910.149479]  [<ffffffff810093aa>] ? hypercall_page+0x3aa/0x1001
[863910.149479]  [<ffffffff8100e6b3>] ? xen_safe_halt+0xc/0x15
[863910.149479]  [<ffffffff8100bfc7>] ? xen_idle+0x37/0x40
[863910.149479]  [<ffffffff81010e97>] ? cpu_idle+0xa2/0xda
[863910.149479]  [<ffffffff8100ec99>] ? xen_irq_enable_direct_end+0x0/0x7
[863910.149479]  [<ffffffff81302947>] ? cpu_bringup+0x6d/0x72

-- 
Valentin

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package src:linux-2.6. (Fri, 26 Oct 2012 13:39:03 GMT) (full text, mbox, link).

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 26 Oct 2012 13:39:03 GMT) (full text, mbox, link).

Message #109 received at 599161@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Hi all,

I've BCC'd a number of people who have reported seeing this bug at
various times in the past.

If you can still repro I'd appreciate it if you could give the patch in
http://marc.info/?l=xen-devel&m=135049062216685&w=2 (also attached) a go
and report back success/failure and the output of the debugging messages
produced.

Thanks,
Ian.

-- 
Ian Campbell
Current Noise: Death - Evil Dead

Executive ability is prominent in your make-up.

[00-tsc-debug (text/x-patch, attachment)]

Acknowledgement sent to Mauro <mrsanna1@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 26 Oct 2012 18:27:03 GMT) (full text, mbox, link).

Message #114 received at 599161@bugs.debian.org (full text, mbox, reply):

On 26 October 2012 14:59, Ian Campbell <ijc@hellion.org.uk> wrote:
> Hi all,
>
> I've BCC'd a number of people who have reported seeing this bug at
> various times in the past.
>
> If you can still repro I'd appreciate it if you could give the patch in
> http://marc.info/?l=xen-devel&m=135049062216685&w=2 (also attached) a go
> and report back success/failure and the output of the debugging messages
> produced.

Is that patch for amd64 architectures?

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 26 Oct 2012 18:42:03 GMT) (full text, mbox, link).

Message #119 received at 599161@bugs.debian.org (full text, mbox, reply):

On Fri, 2012-10-26 at 19:25 +0100, Mauro wrote:
> On 26 October 2012 14:59, Ian Campbell <ijc@hellion.org.uk> wrote:
> > Hi all,
> >
> > I've BCC'd a number of people who have reported seeing this bug at
> > various times in the past.
> >
> > If you can still repro I'd appreciate it if you could give the patch in
> > http://marc.info/?l=xen-devel&m=135049062216685&w=2 (also attached) a go
> > and report back success/failure and the output of the debugging messages
> > produced.
> 
> Is that patch for amd64 architectures?

It is for 32 and 64 bit x86, so yes.

BTW, you were the original recipient of this patch in the thread linked
above.

Ian.
-- 
Ian Campbell


"I'd love to go out with you, but I'm having all my plants neutered."

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package src:linux-2.6. (Wed, 07 Nov 2012 10:57:03 GMT) (full text, mbox, link).

Acknowledgement sent to <Philippe.Simonet@swisscom.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 07 Nov 2012 10:57:03 GMT) (full text, mbox, link).

Message #124 received at 599161@bugs.debian.org (full text, mbox, reply):

Hi Ian

i compiled  a patched hypervisor for Mauro, it is running since many days and the overflow occured, 
without clock jumps

> (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
> tsc_stamp=e3839fcb0273

(below is the complete xm dmesg output)

did that help you ? do you need more info ? 

thanks and regards

Philippe


> -----Original Message-----
> From: Mauro [mailto:mrsanna1@gmail.com]
> Sent: Wednesday, November 07, 2012 10:12 AM
> To: Simonet Philippe, ITS-OUS-OP-IFM-NW-IPE
> Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems
> 
> Hello, no news until now there aren't clock jumps.
> Here is xm dmesg:
> 
> xm dmesg
> (XEN) Xen version 4.0.1 (Debian 4.0.1-5.4) (ultrotter@debian.org) (gcc
> version 4.4.5 (Debian 4.4.5-8) ) Mon Oct 29 14:42:12 CET 2012
> (XEN) Bootloader: GRUB 1.98+20100804-14+squeeze1
> (XEN) Command line: placeholder dom0_mem=3072M loglvl=warning
> guest_loglvl=warning
> (XEN) Video information:
> (XEN)  VGA is text mode 80x25, font 8x16
> (XEN)  VBE/DDC methods: V2; EDID transfer time: 2 seconds
> (XEN) Disc information:
> (XEN)  Found 2 MBR signatures
> (XEN)  Found 2 EDD information structures
> (XEN) Xen-e820 RAM map:
> (XEN)  0000000000000000 - 000000000009f400 (usable)
> (XEN)  000000000009f400 - 00000000000a0000 (reserved)
> (XEN)  00000000000f0000 - 0000000000100000 (reserved)
> (XEN)  0000000000100000 - 00000000cfd43000 (usable)
> (XEN)  00000000cfd43000 - 00000000cfd4c000 (ACPI data)
> (XEN)  00000000cfd4c000 - 00000000cfd4d000 (usable)
> (XEN)  00000000cfd4d000 - 00000000d0000000 (reserved)
> (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
> (XEN)  00000000fec00000 - 00000000fed00000 (reserved)
> (XEN)  00000000fee00000 - 00000000fee10000 (reserved)
> (XEN)  00000000ffc00000 - 0000000100000000 (reserved)
> (XEN)  0000000100000000 - 000000102ffff000 (usable)
> (XEN) ACPI: RSDP 000F4F20, 0024 (r2 HP    )
> (XEN) ACPI: XSDT CFD43900, 007C (r1 HP     ProLiant        2         162E)
> (XEN) ACPI: FACP CFD439C0, 00F4 (r3 HP     ProLiant        2         162E)
> (XEN) ACPI: DSDT CFD43AC0, 30C9 (r1 HP         DSDT        1 INTL 20030228)
> (XEN) ACPI: FACS CFD43100, 0040
> (XEN) ACPI: SPCR CFD43140, 0050 (r1 HP     SPCRRBSU        1         162E)
> (XEN) ACPI: MCFG CFD431C0, 003C (r1 HP     ProLiant        1             0)
> (XEN) ACPI: HPET CFD43200, 0038 (r1 HP     ProLiant        2         162E)
> (XEN) ACPI: FFFF CFD43240, 0064 (r2 HP     P61             2         162E)
> (XEN) ACPI: SPMI CFD432C0, 0040 (r5 HP     ProLiant        1         162E)
> (XEN) ACPI: ERST CFD43300, 01D0 (r1 HP     ProLiant        1         162E)
> (XEN) ACPI: APIC CFD43500, 0176 (r1 HP     ProLiant        2             0)
> (XEN) ACPI: FFFF CFD43680, 0176 (r1 HP     ProLiant        1         162E)
> (XEN) ACPI: BERT CFD43800, 0030 (r1 HP     ProLiant        1         162E)
> (XEN) ACPI: HEST CFD43840, 00BC (r1 HP     ProLiant        1         162E)
> (XEN) System RAM: 65532MB (67105672kB)
> (XEN) Domain heap initialised
> (XEN) Processor #0 6:15 APIC version 20
> (XEN) Processor #8 6:15 APIC version 20
> (XEN) Processor #16 6:15 APIC version 20
> (XEN) Processor #24 6:15 APIC version 20
> (XEN) Processor #1 6:15 APIC version 20
> (XEN) Processor #9 6:15 APIC version 20
> (XEN) Processor #17 6:15 APIC version 20
> (XEN) Processor #25 6:15 APIC version 20
> (XEN) Processor #2 6:15 APIC version 20
> (XEN) Processor #10 6:15 APIC version 20
> (XEN) Processor #18 6:15 APIC version 20
> (XEN) Processor #26 6:15 APIC version 20
> (XEN) Processor #3 6:15 APIC version 20
> (XEN) Processor #11 6:15 APIC version 20
> (XEN) Processor #19 6:15 APIC version 20
> (XEN) Processor #27 6:15 APIC version 20
> (XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
> (XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec80000, GSI 24-47
> (XEN) IOAPIC[2]: apic_id 3, version 32, address 0xfec81000, GSI 48-71
> (XEN) IOAPIC[3]: apic_id 4, version 32, address 0xfec81800, GSI 72-95
> (XEN) Enabling APIC mode:  Phys.  Using 4 I/O APICs
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) Detected 2400.128 MHz processor.
> (XEN) Initing memory sharing.
> (XEN) VMX: Supported advanced features:
> (XEN)  - APIC MMIO access virtualisation
> (XEN)  - APIC TPR shadow
> (XEN)  - Virtual NMI
> (XEN)  - MSR direct-access bitmap
> (XEN) HVM: ASIDs disabled.
> (XEN) HVM: VMX enabled
> (XEN) I/O virtualisation disabled
> (XEN) Total of 16 processors activated.
> (XEN) ENABLING IO-APIC IRQs
> (XEN)  -> Using new ACK method
> (XEN) checking TSC synchronization across 16 CPUs:
> (XEN) CPU#14 had 3 usecs TSC skew, fixed it up.
> (XEN) Platform timer is 14.318MHz HPET
> (XEN) Allocated console ring of 32 KiB.
> (XEN) Brought up 16 CPUs
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN)  Xen  kernel: 64-bit, lsb, compat32
> (XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1708000
> (XEN) PHYSICAL MEMORY ARRANGEMENT:
> (XEN)  Dom0 alloc.:   000000083c000000->0000000840000000 (770048 pages
> to be allocated)
> (XEN) VIRTUAL MEMORY ARRANGEMENT:
> (XEN)  Loaded kernel: ffffffff81000000->ffffffff81708000
> (XEN)  Init. ramdisk: ffffffff81708000->ffffffff81efb000
> (XEN)  Phys-Mach map: ffffffff81efb000->ffffffff824fb000
> (XEN)  Start info:    ffffffff824fb000->ffffffff824fb4b4
> (XEN)  Page tables:   ffffffff824fc000->ffffffff82513000
> (XEN)  Boot stack:    ffffffff82513000->ffffffff82514000
> (XEN)  TOTAL:         ffffffff80000000->ffffffff82800000
> (XEN)  ENTRY ADDRESS: ffffffff81531200
> (XEN) Dom0 has maximum 16 VCPUs
> (XEN) Scrubbing Free RAM:
> .....................................................................................................................
> .....................................................................................................................
> .....................................................................................................................
> .....................................................................................................................
> .....................................................................................................................
> ................................done.
> (XEN) Xen trace buffers: disabled
> (XEN) Std. Loglevel: Errors and warnings
> (XEN) Guest Loglevel: Errors and warnings
> (XEN) Xen is relinquishing VGA console.
> (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to
> Xen)
> (XEN) Freed 176kB init memory.
> (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
> tsc_stamp=e3839fcb0273
> 
> 





> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Ian Campbell
> Sent: Friday, October 26, 2012 3:00 PM
> To: xen-devel@lists.xen.org
> Cc: 599161@bugs.debian.org
> Subject: [Xen-devel] #599161: Xen debug patch for the "clock shifts by 50
> minutes" bug.
> 
> Hi all,
> 
> I've BCC'd a number of people who have reported seeing this bug at various
> times in the past.
> 
> If you can still repro I'd appreciate it if you could give the patch in
> http://marc.info/?l=xen-devel&m=135049062216685&w=2 (also attached) a
> go and report back success/failure and the output of the debugging
> messages produced.
> 
> Thanks,
> Ian.
> 
> --
> Ian Campbell
> Current Noise: Death - Evil Dead
> 
> Executive ability is prominent in your make-up.

Message #129 received at 599161@bugs.debian.org (full text, mbox, reply):

On Wed, 2012-11-07 at 10:10 +0000, Philippe.Simonet@swisscom.com wrote:
> Hi Ian

Thanks for doing this test.

> i compiled  a patched hypervisor for Mauro, it is running since many
> days and the overflow occured, without clock jumps

So just to be clear you saw this logging occur *without* the 50 minute
jump in time? That's good!

> > (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
> > now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
> > plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
> > tsc_stamp=e3839fcb0273
> 
> (below is the complete xm dmesg output)
> 
> did that help you ? do you need more info ? 

I'll leave this to Keir (who wrote the debugging patch) to answer but it
looks to me like it should be useful!

Thanks again.

Ian.

> thanks and regards
> 
> Philippe
> 
> 
> > -----Original Message-----
> > From: Mauro [mailto:mrsanna1@gmail.com]
> > Sent: Wednesday, November 07, 2012 10:12 AM
> > To: Simonet Philippe, ITS-OUS-OP-IFM-NW-IPE
> > Subject: Re: [Xen-devel] [Xen-users] Re: Xen 4 TSC problems
> > 
> > Hello, no news until now there aren't clock jumps.
> > Here is xm dmesg:
> > 
> > xm dmesg
> > (XEN) Xen version 4.0.1 (Debian 4.0.1-5.4) (ultrotter@debian.org) (gcc
> > version 4.4.5 (Debian 4.4.5-8) ) Mon Oct 29 14:42:12 CET 2012
> > (XEN) Bootloader: GRUB 1.98+20100804-14+squeeze1
> > (XEN) Command line: placeholder dom0_mem=3072M loglvl=warning
> > guest_loglvl=warning
> > (XEN) Video information:
> > (XEN)  VGA is text mode 80x25, font 8x16
> > (XEN)  VBE/DDC methods: V2; EDID transfer time: 2 seconds
> > (XEN) Disc information:
> > (XEN)  Found 2 MBR signatures
> > (XEN)  Found 2 EDD information structures
> > (XEN) Xen-e820 RAM map:
> > (XEN)  0000000000000000 - 000000000009f400 (usable)
> > (XEN)  000000000009f400 - 00000000000a0000 (reserved)
> > (XEN)  00000000000f0000 - 0000000000100000 (reserved)
> > (XEN)  0000000000100000 - 00000000cfd43000 (usable)
> > (XEN)  00000000cfd43000 - 00000000cfd4c000 (ACPI data)
> > (XEN)  00000000cfd4c000 - 00000000cfd4d000 (usable)
> > (XEN)  00000000cfd4d000 - 00000000d0000000 (reserved)
> > (XEN)  00000000e0000000 - 00000000f0000000 (reserved)
> > (XEN)  00000000fec00000 - 00000000fed00000 (reserved)
> > (XEN)  00000000fee00000 - 00000000fee10000 (reserved)
> > (XEN)  00000000ffc00000 - 0000000100000000 (reserved)
> > (XEN)  0000000100000000 - 000000102ffff000 (usable)
> > (XEN) ACPI: RSDP 000F4F20, 0024 (r2 HP    )
> > (XEN) ACPI: XSDT CFD43900, 007C (r1 HP     ProLiant        2         162E)
> > (XEN) ACPI: FACP CFD439C0, 00F4 (r3 HP     ProLiant        2         162E)
> > (XEN) ACPI: DSDT CFD43AC0, 30C9 (r1 HP         DSDT        1 INTL 20030228)
> > (XEN) ACPI: FACS CFD43100, 0040
> > (XEN) ACPI: SPCR CFD43140, 0050 (r1 HP     SPCRRBSU        1         162E)
> > (XEN) ACPI: MCFG CFD431C0, 003C (r1 HP     ProLiant        1             0)
> > (XEN) ACPI: HPET CFD43200, 0038 (r1 HP     ProLiant        2         162E)
> > (XEN) ACPI: FFFF CFD43240, 0064 (r2 HP     P61             2         162E)
> > (XEN) ACPI: SPMI CFD432C0, 0040 (r5 HP     ProLiant        1         162E)
> > (XEN) ACPI: ERST CFD43300, 01D0 (r1 HP     ProLiant        1         162E)
> > (XEN) ACPI: APIC CFD43500, 0176 (r1 HP     ProLiant        2             0)
> > (XEN) ACPI: FFFF CFD43680, 0176 (r1 HP     ProLiant        1         162E)
> > (XEN) ACPI: BERT CFD43800, 0030 (r1 HP     ProLiant        1         162E)
> > (XEN) ACPI: HEST CFD43840, 00BC (r1 HP     ProLiant        1         162E)
> > (XEN) System RAM: 65532MB (67105672kB)
> > (XEN) Domain heap initialised
> > (XEN) Processor #0 6:15 APIC version 20
> > (XEN) Processor #8 6:15 APIC version 20
> > (XEN) Processor #16 6:15 APIC version 20
> > (XEN) Processor #24 6:15 APIC version 20
> > (XEN) Processor #1 6:15 APIC version 20
> > (XEN) Processor #9 6:15 APIC version 20
> > (XEN) Processor #17 6:15 APIC version 20
> > (XEN) Processor #25 6:15 APIC version 20
> > (XEN) Processor #2 6:15 APIC version 20
> > (XEN) Processor #10 6:15 APIC version 20
> > (XEN) Processor #18 6:15 APIC version 20
> > (XEN) Processor #26 6:15 APIC version 20
> > (XEN) Processor #3 6:15 APIC version 20
> > (XEN) Processor #11 6:15 APIC version 20
> > (XEN) Processor #19 6:15 APIC version 20
> > (XEN) Processor #27 6:15 APIC version 20
> > (XEN) IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
> > (XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec80000, GSI 24-47
> > (XEN) IOAPIC[2]: apic_id 3, version 32, address 0xfec81000, GSI 48-71
> > (XEN) IOAPIC[3]: apic_id 4, version 32, address 0xfec81800, GSI 72-95
> > (XEN) Enabling APIC mode:  Phys.  Using 4 I/O APICs
> > (XEN) Using scheduler: SMP Credit Scheduler (credit)
> > (XEN) Detected 2400.128 MHz processor.
> > (XEN) Initing memory sharing.
> > (XEN) VMX: Supported advanced features:
> > (XEN)  - APIC MMIO access virtualisation
> > (XEN)  - APIC TPR shadow
> > (XEN)  - Virtual NMI
> > (XEN)  - MSR direct-access bitmap
> > (XEN) HVM: ASIDs disabled.
> > (XEN) HVM: VMX enabled
> > (XEN) I/O virtualisation disabled
> > (XEN) Total of 16 processors activated.
> > (XEN) ENABLING IO-APIC IRQs
> > (XEN)  -> Using new ACK method
> > (XEN) checking TSC synchronization across 16 CPUs:
> > (XEN) CPU#14 had 3 usecs TSC skew, fixed it up.
> > (XEN) Platform timer is 14.318MHz HPET
> > (XEN) Allocated console ring of 32 KiB.
> > (XEN) Brought up 16 CPUs
> > (XEN) *** LOADING DOMAIN 0 ***
> > (XEN)  Xen  kernel: 64-bit, lsb, compat32
> > (XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1708000
> > (XEN) PHYSICAL MEMORY ARRANGEMENT:
> > (XEN)  Dom0 alloc.:   000000083c000000->0000000840000000 (770048 pages
> > to be allocated)
> > (XEN) VIRTUAL MEMORY ARRANGEMENT:
> > (XEN)  Loaded kernel: ffffffff81000000->ffffffff81708000
> > (XEN)  Init. ramdisk: ffffffff81708000->ffffffff81efb000
> > (XEN)  Phys-Mach map: ffffffff81efb000->ffffffff824fb000
> > (XEN)  Start info:    ffffffff824fb000->ffffffff824fb4b4
> > (XEN)  Page tables:   ffffffff824fc000->ffffffff82513000
> > (XEN)  Boot stack:    ffffffff82513000->ffffffff82514000
> > (XEN)  TOTAL:         ffffffff80000000->ffffffff82800000
> > (XEN)  ENTRY ADDRESS: ffffffff81531200
> > (XEN) Dom0 has maximum 16 VCPUs
> > (XEN) Scrubbing Free RAM:
> > .....................................................................................................................
> > .....................................................................................................................
> > .....................................................................................................................
> > .....................................................................................................................
> > .....................................................................................................................
> > ................................done.
> > (XEN) Xen trace buffers: disabled
> > (XEN) Std. Loglevel: Errors and warnings
> > (XEN) Guest Loglevel: Errors and warnings
> > (XEN) Xen is relinquishing VGA console.
> > (XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to
> > Xen)
> > (XEN) Freed 176kB init memory.
> > (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
> > now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
> > plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
> > tsc_stamp=e3839fcb0273
> > 
> > 
> 
> 
> 
> 
> 
> > -----Original Message-----
> > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> > bounces@lists.xen.org] On Behalf Of Ian Campbell
> > Sent: Friday, October 26, 2012 3:00 PM
> > To: xen-devel@lists.xen.org
> > Cc: 599161@bugs.debian.org
> > Subject: [Xen-devel] #599161: Xen debug patch for the "clock shifts by 50
> > minutes" bug.
> > 
> > Hi all,
> > 
> > I've BCC'd a number of people who have reported seeing this bug at various
> > times in the past.
> > 
> > If you can still repro I'd appreciate it if you could give the patch in
> > http://marc.info/?l=xen-devel&m=135049062216685&w=2 (also attached) a
> > go and report back success/failure and the output of the debugging
> > messages produced.
> > 
> > Thanks,
> > Ian.
> > 
> > --
> > Ian Campbell
> > Current Noise: Death - Evil Dead
> > 
> > Executive ability is prominent in your make-up.

Acknowledgement sent to "Jan Beulich" <JBeulich@suse.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 07 Nov 2012 17:30:03 GMT) (full text, mbox, link).

Message #134 received at 599161@bugs.debian.org (full text, mbox, reply):

>>> On 07.11.12 at 11:10, <Philippe.Simonet@swisscom.com> wrote:
> i compiled  a patched hypervisor for Mauro, it is running since many days 
> and the overflow occured, 
> without clock jumps
> 
>> (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
>> tsc_stamp=e3839fcb0273

i.e. we have (in order of time)

 plt_wrap=5ece12d09306
      now=5ece12d16292
  plt_now=5ece12d34128

which is exactly the inverse order of how things should be (now not
necessarily being in the middle). Nor should plt_now and plt_wrap be
that close together. So far I have no idea how this can be explained.

Jan

Acknowledgement sent to Keir Fraser <keir@xen.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 07 Nov 2012 17:45:03 GMT) (full text, mbox, link).

Message #139 received at 599161@bugs.debian.org (full text, mbox, reply):

On 07/11/2012 13:22, "Ian Campbell" <ijc@hellion.org.uk> wrote:

>>> (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
>>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
>>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
>>> tsc_stamp=e3839fcb0273
>> 
>> (below is the complete xm dmesg output)
>> 
>> did that help you ? do you need more info ?
> 
> I'll leave this to Keir (who wrote the debugging patch) to answer but it
> looks to me like it should be useful!

I'm scratching my head. plt_wrap is earlier than plt_now, which should be
impossible. plt_stamp64 oddly has low 32 bits identical to new_stamp. That
seems very very improbable!

I wonder whether the overflow handling should just be removed, or made
conditional on a command-line parameter, or on the 32-bit platform counter
being at least somewhat likely to overflow before a softirq occurs -- it
seems lots of systems are using 14MHz HPET, and that gives us a couple of
minutes for the plt_overflow softirq to do its work before overflow occurs.
I think we would notice that outage in other ways. :)

 -- Keir

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package src:linux-2.6. (Thu, 08 Nov 2012 09:42:03 GMT) (full text, mbox, link).

Acknowledgement sent to "Jan Beulich" <JBeulich@suse.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 09:42:03 GMT) (full text, mbox, link).

Message #144 received at 599161@bugs.debian.org (full text, mbox, reply):

>>> On 07.11.12 at 18:40, Keir Fraser <keir@xen.org> wrote:
> On 07/11/2012 13:22, "Ian Campbell" <ijc@hellion.org.uk> wrote:
> 
>>>> (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
>>>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
>>>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
>>>> tsc_stamp=e3839fcb0273
>>> 
>>> (below is the complete xm dmesg output)
>>> 
>>> did that help you ? do you need more info ?
>> 
>> I'll leave this to Keir (who wrote the debugging patch) to answer but it
>> looks to me like it should be useful!
> 
> I'm scratching my head. plt_wrap is earlier than plt_now, which should be
> impossible. plt_stamp64 oddly has low 32 bits identical to new_stamp. That
> seems very very improbable!

Is it? My understanding was that plt_stamp64 is just a software
extension to the more narrow HW counter, and hence the low
plt_mask bits would always be expected to be identical.

The plt_wrap < plt_now thing of course is entirely unexplainable
to me too: Considering that plt_scale doesn't change at all post-
boot, apart from memory corruption I could only see an memory
access ordering problem to be the reason (platform_timer_stamp
and/or stime_platform_stamp changing despite platform_timer_lock
being held. So maybe taking a snapshot of all three static values
involved in the calculation in __read_platform_stime() between
acquiring the lock and the first call to __read_platform_stime(),
and printing them together with the "live" values in a second
printk() after the one your original patch added could rule that
out.

But the box doesn't even seem to be NUMA (of course it also
doesn't help that the log level was kept restricted - hint, hint,
Philippe), not does there appear to be any S3 cycle or pCPU
bring-up/-down in between...

Philippe, could you clarify again what CPU model(s) this is being
observed on (the long times between individual steps forward
with this problem perhaps warrant repeating the basics each
time, as it's otherwise quite cumbersome to always look up old
pieces of information).

> I wonder whether the overflow handling should just be removed, or made
> conditional on a command-line parameter, or on the 32-bit platform counter
> being at least somewhat likely to overflow before a softirq occurs -- it
> seems lots of systems are using 14MHz HPET, and that gives us a couple of
> minutes for the plt_overflow softirq to do its work before overflow occurs.
> I think we would notice that outage in other ways. :)

Iirc we added this for a good reason - to cover the, however
unlikely, event of Xen running for very long without preemption.
Presumably most of the cases got fixed meanwhile, and indeed
a wraparound time on the order of minutes should make this
superfluous, but as the case here shows that code did spot a
severe anomaly (whatever that may turn out to be).

Also recall that there are HPET implementations around that tick
at a much higher frequency than 14MHz.

So unless we finally reach the understanding that the code is
flawed, I would rather want to keep it.

Jan

Acknowledgement sent to Keir Fraser <keir@xen.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 10:42:06 GMT) (full text, mbox, link).

Message #149 received at 599161@bugs.debian.org (full text, mbox, reply):

On 08/11/2012 09:39, "Jan Beulich" <JBeulich@suse.com> wrote:

>>>>> (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
>>>>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
>>>>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
>>>>> tsc_stamp=e3839fcb0273
>>>> 
>>>> (below is the complete xm dmesg output)
>>>> 
>>>> did that help you ? do you need more info ?
>>> 
>>> I'll leave this to Keir (who wrote the debugging patch) to answer but it
>>> looks to me like it should be useful!
>> 
>> I'm scratching my head. plt_wrap is earlier than plt_now, which should be
>> impossible. plt_stamp64 oddly has low 32 bits identical to new_stamp. That
>> seems very very improbable!
> 
> Is it? My understanding was that plt_stamp64 is just a software
> extension to the more narrow HW counter, and hence the low
> plt_mask bits would always be expected to be identical.

No, plt_stamp is simply the HW counter time at which plt_stamp64 was last
brought up to date. Hence plt_stamp64 is updated as:
 plt_stamp64 += (new_stamp - old_stamp) & plt_mask;

Hence why seeing plt_stamp64&plt_mask == new_stamp is very unexpected!

 -- Keir

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 11:48:06 GMT) (full text, mbox, link).

Message #154 received at 599161@bugs.debian.org (full text, mbox, reply):

On Wed, 2012-11-07 at 17:40 +0000, Keir Fraser wrote:
> On 07/11/2012 13:22, "Ian Campbell" <ijc@hellion.org.uk> wrote:
> 
> >>> (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
> >>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
> >>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
> >>> tsc_stamp=e3839fcb0273
> >> 
> >> (below is the complete xm dmesg output)
> >> 
> >> did that help you ? do you need more info ?
> > 
> > I'll leave this to Keir (who wrote the debugging patch) to answer but it
> > looks to me like it should be useful!
> 
> I'm scratching my head. plt_wrap is earlier than plt_now, which should be
> impossible.

impossible due to guarantees made by the h/w or by construction in Xen.
There appears to be a certain amount of hardware-specificness to the
issue -- so I'm wondering if maybe there are some platforms whose tsc is
not as monotonically increasing as it needs to be...

>  plt_stamp64 oddly has low 32 bits identical to new_stamp. That
> seems very very improbable!

Does this code run on all cpus or just one? Is it always the same one?

> I wonder whether the overflow handling should just be removed, or made
> conditional on a command-line parameter, or on the 32-bit platform counter
> being at least somewhat likely to overflow before a softirq occurs -- it
> seems lots of systems are using 14MHz HPET, and that gives us a couple of
> minutes for the plt_overflow softirq to do its work before overflow occurs.
> I think we would notice that outage in other ways. :)
> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Acknowledgement sent to Keir Fraser <keir@xen.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 12:57:06 GMT) (full text, mbox, link).

Message #159 received at 599161@bugs.debian.org (full text, mbox, reply):

On 08/11/2012 11:43, "Ian Campbell" <ijc@hellion.org.uk> wrote:

>>> I'll leave this to Keir (who wrote the debugging patch) to answer but it
>>> looks to me like it should be useful!
>> 
>> I'm scratching my head. plt_wrap is earlier than plt_now, which should be
>> impossible.
> 
> impossible due to guarantees made by the h/w or by construction in Xen.

That's a question, right? By construction in Xen.

> There appears to be a certain amount of hardware-specificness to the
> issue -- so I'm wondering if maybe there are some platforms whose tsc is
> not as monotonically increasing as it needs to be...

plt_* timestamps are not derived from TSC at all.

>>  plt_stamp64 oddly has low 32 bits identical to new_stamp. That
>> seems very very improbable!
> 
> Does this code run on all cpus or just one? Is it always the same one?

Always cpu0.

 -- Keir

Acknowledgement sent to <Philippe.Simonet@swisscom.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 13:51:03 GMT) (full text, mbox, link).

Message #164 received at 599161@bugs.debian.org (full text, mbox, reply):

Hi Mauro, 

that's a question for you : 

> Philippe, could you clarify again what CPU model(s) this is being observed on
> (the long times between individual steps forward with this problem perhaps
> warrant repeating the basics each time, as it's otherwise quite cumbersome
> to always look up old pieces of information).

can you provide this information ? 
	cat /proc/cpuinfo 	
	cat /proc/meminfo
	hardware information (manufacturer, model, urls, ...)

Thanks, Philippe


> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, November 08, 2012 10:40 AM
> To: Simonet Philippe, ITS-OUS-OP-IFM-NW-IPE; Keir Fraser
> Cc: 599161@bugs.debian.org; mrsanna1@gmail.com; Ian Campbell; xen-
> devel@lists.xen.org
> Subject: Re: [Xen-devel] #599161: Xen debug patch for the "clock shifts by 50
> minutes" bug.
> 
> >>> On 07.11.12 at 18:40, Keir Fraser <keir@xen.org> wrote:
> > On 07/11/2012 13:22, "Ian Campbell" <ijc@hellion.org.uk> wrote:
> >
> >>>> (XEN) XXX plt_overflow: plt_now=5ece12d34128
> plt_wrap=5ece12d09306
> >>>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
> >>>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
> >>>> tsc_stamp=e3839fcb0273
> >>>
> >>> (below is the complete xm dmesg output)
> >>>
> >>> did that help you ? do you need more info ?
> >>
> >> I'll leave this to Keir (who wrote the debugging patch) to answer but
> >> it looks to me like it should be useful!
> >
> > I'm scratching my head. plt_wrap is earlier than plt_now, which should
> > be impossible. plt_stamp64 oddly has low 32 bits identical to
> > new_stamp. That seems very very improbable!
> 
> Is it? My understanding was that plt_stamp64 is just a software extension to
> the more narrow HW counter, and hence the low plt_mask bits would always
> be expected to be identical.
> 
> The plt_wrap < plt_now thing of course is entirely unexplainable to me too:
> Considering that plt_scale doesn't change at all post- boot, apart from
> memory corruption I could only see an memory access ordering problem to
> be the reason (platform_timer_stamp and/or stime_platform_stamp
> changing despite platform_timer_lock being held. So maybe taking a
> snapshot of all three static values involved in the calculation in
> __read_platform_stime() between acquiring the lock and the first call to
> __read_platform_stime(), and printing them together with the "live" values
> in a second
> printk() after the one your original patch added could rule that out.
> 
> But the box doesn't even seem to be NUMA (of course it also doesn't help
> that the log level was kept restricted - hint, hint, Philippe), not does there
> appear to be any S3 cycle or pCPU bring-up/-down in between...
> 
> Philippe, could you clarify again what CPU model(s) this is being observed on
> (the long times between individual steps forward with this problem perhaps
> warrant repeating the basics each time, as it's otherwise quite cumbersome
> to always look up old pieces of information).
> 
> > I wonder whether the overflow handling should just be removed, or made
> > conditional on a command-line parameter, or on the 32-bit platform
> > counter being at least somewhat likely to overflow before a softirq
> > occurs -- it seems lots of systems are using 14MHz HPET, and that
> > gives us a couple of minutes for the plt_overflow softirq to do its work
> before overflow occurs.
> > I think we would notice that outage in other ways. :)
> 
> Iirc we added this for a good reason - to cover the, however unlikely, event
> of Xen running for very long without preemption.
> Presumably most of the cases got fixed meanwhile, and indeed a
> wraparound time on the order of minutes should make this superfluous, but
> as the case here shows that code did spot a severe anomaly (whatever that
> may turn out to be).
> 
> Also recall that there are HPET implementations around that tick at a much
> higher frequency than 14MHz.
> 
> So unless we finally reach the understanding that the code is flawed, I would
> rather want to keep it.
> 
> Jan

Acknowledgement sent to "Jan Beulich" <JBeulich@suse.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 13:57:03 GMT) (full text, mbox, link).

Message #169 received at 599161@bugs.debian.org (full text, mbox, reply):

>>> On 08.11.12 at 11:38, Keir Fraser <keir@xen.org> wrote:
> On 08/11/2012 09:39, "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>>>>>> (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
>>>>>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
>>>>>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
>>>>>> tsc_stamp=e3839fcb0273
>>>>> 
>>>>> (below is the complete xm dmesg output)
>>>>> 
>>>>> did that help you ? do you need more info ?
>>>> 
>>>> I'll leave this to Keir (who wrote the debugging patch) to answer but it
>>>> looks to me like it should be useful!
>>> 
>>> I'm scratching my head. plt_wrap is earlier than plt_now, which should be
>>> impossible. plt_stamp64 oddly has low 32 bits identical to new_stamp. That
>>> seems very very improbable!
>> 
>> Is it? My understanding was that plt_stamp64 is just a software
>> extension to the more narrow HW counter, and hence the low
>> plt_mask bits would always be expected to be identical.
> 
> No, plt_stamp is simply the HW counter time at which plt_stamp64 was last
> brought up to date. Hence plt_stamp64 is updated as:
>  plt_stamp64 += (new_stamp - old_stamp) & plt_mask;

I concur: Given that what old_stamp is here was new_stamp for
the last update, we should simply have

stamp64 = s0 + (s1 - s0) + (s2 - s1) + ...

(of course with the mask applied on each addend), which (for the
low bits) is the same as just new_stamp.

Jan

Acknowledgement sent to Keir Fraser <keir@xen.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 14:09:03 GMT) (full text, mbox, link).

Message #174 received at 599161@bugs.debian.org (full text, mbox, reply):

On 08/11/2012 13:53, "Jan Beulich" <JBeulich@suse.com> wrote:

>>> Is it? My understanding was that plt_stamp64 is just a software
>>> extension to the more narrow HW counter, and hence the low
>>> plt_mask bits would always be expected to be identical.
>> 
>> No, plt_stamp is simply the HW counter time at which plt_stamp64 was last
>> brought up to date. Hence plt_stamp64 is updated as:
>>  plt_stamp64 += (new_stamp - old_stamp) & plt_mask;
> 
> I concur

Well, no, you don't really. You're about to point out the flaw in my
reasoning...

> : Given that what old_stamp is here was new_stamp for
> the last update, we should simply have
> 
> stamp64 = s0 + (s1 - s0) + (s2 - s1) + ...
> 
> (of course with the mask applied on each addend), which (for the
> low bits) is the same as just new_stamp.

Very good point. Silly me. Then the observed value of plt_stamp64 makes
perfect sense.

 -- Keir

Acknowledgement sent to Keir Fraser <keir@xen.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 14:21:07 GMT) (full text, mbox, link).

Message #179 received at 599161@bugs.debian.org (full text, mbox, reply):

On 07/11/2012 17:40, "Keir Fraser" <keir@xen.org> wrote:

> On 07/11/2012 13:22, "Ian Campbell" <ijc@hellion.org.uk> wrote:
> 
>>>> (XEN) XXX plt_overflow: plt_now=5ece12d34128 plt_wrap=5ece12d09306
>>>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
>>>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
>>>> tsc_stamp=e3839fcb0273
>>> 
>>> (below is the complete xm dmesg output)
>>> 
>>> did that help you ? do you need more info ?
>> 
>> I'll leave this to Keir (who wrote the debugging patch) to answer but it
>> looks to me like it should be useful!
> 
> I'm scratching my head. plt_wrap is earlier than plt_now, which should be
> impossible. plt_stamp64 oddly has low 32 bits identical to new_stamp. That
> seems very very improbable!

Jan has pointed out that the value of plt_stamp64 makes perfect sense, and
will in fact always have low 32 bits identical to new_stamp. At least that
is explained.

So, the question is then why plt_now (== __read_platform_stime(15b800366a5))
is greater than plt_wrap (== __read_platform_stime(15c800366a5)). Perhaps
the scale_delta() logic is failing for some reason, but we do use it a lot
elsewhere!

 -- Keir

> I wonder whether the overflow handling should just be removed, or made
> conditional on a command-line parameter, or on the 32-bit platform counter
> being at least somewhat likely to overflow before a softirq occurs -- it
> seems lots of systems are using 14MHz HPET, and that gives us a couple of
> minutes for the plt_overflow softirq to do its work before overflow occurs.
> I think we would notice that outage in other ways. :)
> 
>  -- Keir
> 
>

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 14:33:03 GMT) (full text, mbox, link).

Message #184 received at 599161@bugs.debian.org (full text, mbox, reply):

On Thu, 2012-11-08 at 12:54 +0000, Keir Fraser wrote:
> On 08/11/2012 11:43, "Ian Campbell" <ijc@hellion.org.uk> wrote:
> 
> >>> I'll leave this to Keir (who wrote the debugging patch) to answer but it
> >>> looks to me like it should be useful!
> >> 
> >> I'm scratching my head. plt_wrap is earlier than plt_now, which should be
> >> impossible.
> > 
> > impossible due to guarantees made by the h/w or by construction in Xen.
> 
> That's a question, right? 

Yes, sorry.

> By construction in Xen.
> 
> > There appears to be a certain amount of hardware-specificness to the
> > issue -- so I'm wondering if maybe there are some platforms whose tsc is
> > not as monotonically increasing as it needs to be...
> 
> plt_* timestamps are not derived from TSC at all.

I see, rather it is derived from the platform_timesource which could be
HPET, pmtimer, pit etc but in this case (according to the provided xm
dmesg) appears to be a 14MHz HPET.

So I guess s/tsc/HPET/ in my original thought...

> >>  plt_stamp64 oddly has low 32 bits identical to new_stamp. That
> >> seems very very improbable!
> > 
> > Does this code run on all cpus or just one? Is it always the same one?
> 
> Always cpu0.

So it's not cross cpu drift then.

Ian.

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 14:33:05 GMT) (full text, mbox, link).

Message #189 received at 599161@bugs.debian.org (full text, mbox, reply):

On Thu, 2012-11-08 at 13:47 +0000, Philippe.Simonet@swisscom.com wrote:
> Hi Mauro, 
> 
> that's a question for you : 

I think Jan was asking for information relating to the system you saw
this on -- or are you working on the same systems as Mauro?

Of course additional information from Mauro would be useful too in order
to help spotting any patterns.

> > Philippe, could you clarify again what CPU model(s) this is being observed on
> > (the long times between individual steps forward with this problem perhaps
> > warrant repeating the basics each time, as it's otherwise quite cumbersome
> > to always look up old pieces of information).
> 
> can you provide this information ? 
> 	cat /proc/cpuinfo 	
> 	cat /proc/meminfo
> 	hardware information (manufacturer, model, urls, ...)
> 
> Thanks, Philippe
> 
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@suse.com]
> > Sent: Thursday, November 08, 2012 10:40 AM
> > To: Simonet Philippe, ITS-OUS-OP-IFM-NW-IPE; Keir Fraser
> > Cc: 599161@bugs.debian.org; mrsanna1@gmail.com; Ian Campbell; xen-
> > devel@lists.xen.org
> > Subject: Re: [Xen-devel] #599161: Xen debug patch for the "clock shifts by 50
> > minutes" bug.
> > 
> > >>> On 07.11.12 at 18:40, Keir Fraser <keir@xen.org> wrote:
> > > On 07/11/2012 13:22, "Ian Campbell" <ijc@hellion.org.uk> wrote:
> > >
> > >>>> (XEN) XXX plt_overflow: plt_now=5ece12d34128
> > plt_wrap=5ece12d09306
> > >>>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
> > >>>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
> > >>>> tsc_stamp=e3839fcb0273
> > >>>
> > >>> (below is the complete xm dmesg output)
> > >>>
> > >>> did that help you ? do you need more info ?
> > >>
> > >> I'll leave this to Keir (who wrote the debugging patch) to answer but
> > >> it looks to me like it should be useful!
> > >
> > > I'm scratching my head. plt_wrap is earlier than plt_now, which should
> > > be impossible. plt_stamp64 oddly has low 32 bits identical to
> > > new_stamp. That seems very very improbable!
> > 
> > Is it? My understanding was that plt_stamp64 is just a software extension to
> > the more narrow HW counter, and hence the low plt_mask bits would always
> > be expected to be identical.
> > 
> > The plt_wrap < plt_now thing of course is entirely unexplainable to me too:
> > Considering that plt_scale doesn't change at all post- boot, apart from
> > memory corruption I could only see an memory access ordering problem to
> > be the reason (platform_timer_stamp and/or stime_platform_stamp
> > changing despite platform_timer_lock being held. So maybe taking a
> > snapshot of all three static values involved in the calculation in
> > __read_platform_stime() between acquiring the lock and the first call to
> > __read_platform_stime(), and printing them together with the "live" values
> > in a second
> > printk() after the one your original patch added could rule that out.
> > 
> > But the box doesn't even seem to be NUMA (of course it also doesn't help
> > that the log level was kept restricted - hint, hint, Philippe), not does there
> > appear to be any S3 cycle or pCPU bring-up/-down in between...
> > 
> > Philippe, could you clarify again what CPU model(s) this is being observed on
> > (the long times between individual steps forward with this problem perhaps
> > warrant repeating the basics each time, as it's otherwise quite cumbersome
> > to always look up old pieces of information).
> > 
> > > I wonder whether the overflow handling should just be removed, or made
> > > conditional on a command-line parameter, or on the 32-bit platform
> > > counter being at least somewhat likely to overflow before a softirq
> > > occurs -- it seems lots of systems are using 14MHz HPET, and that
> > > gives us a couple of minutes for the plt_overflow softirq to do its work
> > before overflow occurs.
> > > I think we would notice that outage in other ways. :)
> > 
> > Iirc we added this for a good reason - to cover the, however unlikely, event
> > of Xen running for very long without preemption.
> > Presumably most of the cases got fixed meanwhile, and indeed a
> > wraparound time on the order of minutes should make this superfluous, but
> > as the case here shows that code did spot a severe anomaly (whatever that
> > may turn out to be).
> > 
> > Also recall that there are HPET implementations around that tick at a much
> > higher frequency than 14MHz.
> > 
> > So unless we finally reach the understanding that the code is flawed, I would
> > rather want to keep it.
> > 
> > Jan
> 
>

Acknowledgement sent to Keir Fraser <keir@xen.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 14:39:03 GMT) (full text, mbox, link).

Message #194 received at 599161@bugs.debian.org (full text, mbox, reply):

On 08/11/2012 14:28, "Ian Campbell" <ijc@hellion.org.uk> wrote:

>>> There appears to be a certain amount of hardware-specificness to the
>>> issue -- so I'm wondering if maybe there are some platforms whose tsc is
>>> not as monotonically increasing as it needs to be...
>> 
>> plt_* timestamps are not derived from TSC at all.
> 
> I see, rather it is derived from the platform_timesource which could be
> HPET, pmtimer, pit etc but in this case (according to the provided xm
> dmesg) appears to be a 14MHz HPET.
> 
> So I guess s/tsc/HPET/ in my original thought...

In fact plt_now and plt_wrap are both derived from the same value of
plt_stamp64. One is derived from it directly, and the other from
plt_stamp64+plt_mask+1 (== plt_stamp64+(1<<32)).

 -- Keir

Acknowledgement sent to Mauro <mrsanna1@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 15:24:03 GMT) (full text, mbox, link).

Message #199 received at 599161@bugs.debian.org (full text, mbox, reply):

On 8 November 2012 14:47,  <Philippe.Simonet@swisscom.com> wrote:
> Hi Mauro,
>
> that's a question for you :
>
>> Philippe, could you clarify again what CPU model(s) this is being observed on
>> (the long times between individual steps forward with this problem perhaps
>> warrant repeating the basics each time, as it's otherwise quite cumbersome
>> to always look up old pieces of information).
>
> can you provide this information ?
>         cat /proc/cpuinfo
>         cat /proc/meminfo
>         hardware information (manufacturer, model, urls, ...)
>

cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 4
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 6
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 7
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 8
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 9
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 10
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 11
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 12
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 13
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 14
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E7330  @ 2.40GHz
stepping        : 11
cpu MHz         : 2400.128
cache size      : 3072 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov
pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc
rep_good aperfmperf pni est ssse3 cx16 hypervisor lahf_lm
bogomips        : 4800.25
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:


cat /proc/meminfo
MemTotal:        3127132 kB
MemFree:         2697364 kB
Buffers:           65816 kB
Cached:            62188 kB
SwapCached:            0 kB
Active:           117156 kB
Inactive:          46840 kB
Active(anon):      28668 kB
Inactive(anon):    16360 kB
Active(file):      88488 kB
Inactive(file):    30480 kB
Unevictable:       18944 kB
Mlocked:           18944 kB
SwapTotal:        974840 kB
SwapFree:         974840 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         54936 kB
Mapped:            14124 kB
Shmem:               508 kB
Slab:              66072 kB
SReclaimable:      19732 kB
SUnreclaim:        46340 kB
KernelStack:        4016 kB
PageTables:         4216 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2538404 kB
Committed_AS:     202468 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      289904 kB
VmallocChunk:   34359384628 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     3145728 kB
DirectMap2M:           0 kB


note that 3G is the mem reserved to dom0, the total amount of mem in
the machine is 64G.


The machine is a HP Proliant DL 580 G5,
http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01154573&lang=en&cc=us&taskId=&prodSeriesId=3454575&prodTypeId=15351

Thanks to all for the work.

Acknowledgement sent to Keir Fraser <keir@xen.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 17:00:08 GMT) (full text, mbox, link).

Message #204 received at 599161@bugs.debian.org (full text, mbox, reply):

On 08/11/2012 16:45, "Tim Deegan" <tim@xen.org> wrote:

>>> I wonder whether the overflow handling should just be removed, or made
>>> conditional on a command-line parameter, or on the 32-bit platform counter
>>> being at least somewhat likely to overflow before a softirq occurs -- it
>>> seems lots of systems are using 14MHz HPET, and that gives us a couple of
>>> minutes for the plt_overflow softirq to do its work before overflow occurs.
>>> I think we would notice that outage in other ways. :)
>> 
>> Iirc we added this for a good reason - to cover the, however
>> unlikely, event of Xen running for very long without preemption.
>> Presumably most of the cases got fixed meanwhile, and indeed
>> a wraparound time on the order of minutes should make this
>> superfluous, but as the case here shows that code did spot a
>> severe anomaly (whatever that may turn out to be).
> 
> ISTR when this code went in we were dealing with a timer that had a
> period of about 4 seconds (ACPI PMTIMER?).  It might well be OTT for the
> HPET, but if there's something weird going on I'd like to track it down
> while we have some sort of a handle on it.

It must have been the PMTIMER. It's the only counter narrower than 32 bits
(legacy PIT we simulate as a 32-bit counter behind the scenes).

 -- Keir

Acknowledgement sent to Tim Deegan <tim@xen.org>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 08 Nov 2012 17:30:03 GMT) (full text, mbox, link).

Message #209 received at 599161@bugs.debian.org (full text, mbox, reply):

At 09:39 +0000 on 08 Nov (1352367592), Jan Beulich wrote:
> The plt_wrap < plt_now thing of course is entirely unexplainable
> to me too: Considering that plt_scale doesn't change at all post-
> boot, apart from memory corruption I could only see an memory
> access ordering problem to be the reason (platform_timer_stamp
> and/or stime_platform_stamp changing despite platform_timer_lock
> being held. So maybe taking a snapshot of all three static values
> involved in the calculation in __read_platform_stime() between
> acquiring the lock and the first call to __read_platform_stime(),
> and printing them together with the "live" values in a second
> printk() after the one your original patch added could rule that
> out.
>  
> But the box doesn't even seem to be NUMA (of course it also
> doesn't help that the log level was kept restricted - hint, hint,
> Philippe), not does there appear to be any S3 cycle or pCPU
> bring-up/-down in between...

S3 looks like it might be a culprit, since resume_platform_timer()
clobbers plt_stamp64 without taking the platform_timer_lock.  But both
the S3 resume code and the plt_overflow timer should only ever run on
CPU 0, so even that should be safe (unless continue_hypercall_on_cpu()
is broken...)

Definitely having loglvl=all would have helped here, to eliminate S3
from our enquiries.

> > I wonder whether the overflow handling should just be removed, or made
> > conditional on a command-line parameter, or on the 32-bit platform counter
> > being at least somewhat likely to overflow before a softirq occurs -- it
> > seems lots of systems are using 14MHz HPET, and that gives us a couple of
> > minutes for the plt_overflow softirq to do its work before overflow occurs.
> > I think we would notice that outage in other ways. :)
> 
> Iirc we added this for a good reason - to cover the, however
> unlikely, event of Xen running for very long without preemption.
> Presumably most of the cases got fixed meanwhile, and indeed
> a wraparound time on the order of minutes should make this
> superfluous, but as the case here shows that code did spot a
> severe anomaly (whatever that may turn out to be).

ISTR when this code went in we were dealing with a timer that had a
period of about 4 seconds (ACPI PMTIMER?).  It might well be OTT for the
HPET, but if there's something weird going on I'd like to track it down
while we have some sort of a handle on it.

Tim.

Acknowledgement sent to <Philippe.Simonet@swisscom.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 09 Nov 2012 09:09:06 GMT) (full text, mbox, link).

Message #214 received at 599161@bugs.debian.org (full text, mbox, reply):

> -----Original Message-----
> From: Ian Campbell [mailto:ijc@hellion.org.uk]
> Sent: Thursday, November 08, 2012 3:29 PM
> To: Simonet Philippe, ITS-OUS-OP-IFM-NW-IPE
> Cc: mrsanna1@gmail.com; 599161@bugs.debian.org; xen-
> devel@lists.xen.org; keir@xen.org; JBeulich@suse.com
> Subject: Re: [Xen-devel] #599161: Xen debug patch for the "clock shifts by 50 minutes" bug.
> 
> 
> I think Jan was asking for information relating to the system you saw this on -
> - or are you working on the same systems as Mauro?

oops, excuse me, here is a description : I have the problem on 4 systems, all with same hardware.
the problem occured  on each system, 1 time each 2 month in average. since January 2012, I decided to reboot them all monthly, 
and the clock jump occurred only once in February ...

SYSTEM : 		HP ProLiant DL385 G7, with 2 * AMD Processor 6174 (12 cores) = 24 cores, 16 GB MEMORY
XEN    			 (XEN) Xen version 4.0.1 (Debian 4.0.1-5.4) (ultrotter@debian.org) (gcc version 4.4.5 (Debian 4.4.5-8) ) Sat Sep  8 19:15:46 UTC 2012
DOM0			Linux 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012 x86_64 GNU/Linux
CPU 			
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 9
model name      : AMD Opteron(tm) Processor 6174
stepping        : 1
cpu MHz         : 3791872.477
cache size      : 512 KB
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic mtrr mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow constant_tsc rep_good nonstop_tsc extd_apicid amd_dcm pni cx16 popcnt hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch nodeid_msr
bogomips        : 4400.17
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

> 
> Of course additional information from Mauro would be useful too in order to
> help spotting any patterns.
> 
> > > Philippe, could you clarify again what CPU model(s) this is being
> > > observed on (the long times between individual steps forward with
> > > this problem perhaps warrant repeating the basics each time, as it's
> > > otherwise quite cumbersome to always look up old pieces of
> information).
> >
> > can you provide this information ?
> > 	cat /proc/cpuinfo
> > 	cat /proc/meminfo
> > 	hardware information (manufacturer, model, urls, ...)
> >
> > Thanks, Philippe
> >
> >
> > > -----Original Message-----
> > > From: Jan Beulich [mailto:JBeulich@suse.com]
> > > Sent: Thursday, November 08, 2012 10:40 AM
> > > To: Simonet Philippe, ITS-OUS-OP-IFM-NW-IPE; Keir Fraser
> > > Cc: 599161@bugs.debian.org; mrsanna1@gmail.com; Ian Campbell; xen-
> > > devel@lists.xen.org
> > > Subject: Re: [Xen-devel] #599161: Xen debug patch for the "clock
> > > shifts by 50 minutes" bug.
> > >
> > > >>> On 07.11.12 at 18:40, Keir Fraser <keir@xen.org> wrote:
> > > > On 07/11/2012 13:22, "Ian Campbell" <ijc@hellion.org.uk> wrote:
> > > >
> > > >>>> (XEN) XXX plt_overflow: plt_now=5ece12d34128
> > > plt_wrap=5ece12d09306
> > > >>>> now=5ece12d16292 old_stamp=35c7c new_stamp=800366a5
> > > >>>> plt_stamp64=15b800366a5 plt_mask=ffffffff tsc=e3839fd23854
> > > >>>> tsc_stamp=e3839fcb0273
> > > >>>
> > > >>> (below is the complete xm dmesg output)
> > > >>>
> > > >>> did that help you ? do you need more info ?
> > > >>
> > > >> I'll leave this to Keir (who wrote the debugging patch) to answer
> > > >> but it looks to me like it should be useful!
> > > >
> > > > I'm scratching my head. plt_wrap is earlier than plt_now, which
> > > > should be impossible. plt_stamp64 oddly has low 32 bits identical
> > > > to new_stamp. That seems very very improbable!
> > >
> > > Is it? My understanding was that plt_stamp64 is just a software
> > > extension to the more narrow HW counter, and hence the low plt_mask
> > > bits would always be expected to be identical.
> > >
> > > The plt_wrap < plt_now thing of course is entirely unexplainable to me
> too:
> > > Considering that plt_scale doesn't change at all post- boot, apart
> > > from memory corruption I could only see an memory access ordering
> > > problem to be the reason (platform_timer_stamp and/or
> > > stime_platform_stamp changing despite platform_timer_lock being
> > > held. So maybe taking a snapshot of all three static values involved
> > > in the calculation in
> > > __read_platform_stime() between acquiring the lock and the first
> > > call to __read_platform_stime(), and printing them together with the
> > > "live" values in a second
> > > printk() after the one your original patch added could rule that out.
> > >
> > > But the box doesn't even seem to be NUMA (of course it also doesn't
> > > help that the log level was kept restricted - hint, hint, Philippe),
> > > not does there appear to be any S3 cycle or pCPU bring-up/-down in
> between...
> > >
> > > Philippe, could you clarify again what CPU model(s) this is being
> > > observed on (the long times between individual steps forward with
> > > this problem perhaps warrant repeating the basics each time, as it's
> > > otherwise quite cumbersome to always look up old pieces of
> information).
> > >
> > > > I wonder whether the overflow handling should just be removed, or
> > > > made conditional on a command-line parameter, or on the 32-bit
> > > > platform counter being at least somewhat likely to overflow before
> > > > a softirq occurs -- it seems lots of systems are using 14MHz HPET,
> > > > and that gives us a couple of minutes for the plt_overflow softirq
> > > > to do its work
> > > before overflow occurs.
> > > > I think we would notice that outage in other ways. :)
> > >
> > > Iirc we added this for a good reason - to cover the, however
> > > unlikely, event of Xen running for very long without preemption.
> > > Presumably most of the cases got fixed meanwhile, and indeed a
> > > wraparound time on the order of minutes should make this
> > > superfluous, but as the case here shows that code did spot a severe
> > > anomaly (whatever that may turn out to be).
> > >
> > > Also recall that there are HPET implementations around that tick at
> > > a much higher frequency than 14MHz.
> > >
> > > So unless we finally reach the understanding that the code is
> > > flawed, I would rather want to keep it.
> > >
> > > Jan
> >
> >
>

Acknowledgement sent to "Jan Beulich" <JBeulich@suse.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Fri, 09 Nov 2012 09:51:06 GMT) (full text, mbox, link).

Message #219 received at 599161@bugs.debian.org (full text, mbox, reply):

>>> On 09.11.12 at 10:05, <Philippe.Simonet@swisscom.com> wrote:
> oops, excuse me, here is a description : I have the problem on 4 systems, 
> all with same hardware.
> the problem occured  on each system, 1 time each 2 month in average. since 
> January 2012, I decided to reboot them all monthly, 
> and the clock jump occurred only once in February ...
> 
> SYSTEM : 		HP ProLiant DL385 G7, with 2 * AMD Processor 6174 (12 cores) = 24 
> cores, 16 GB MEMORY
> XEN    			 (XEN) Xen version 4.0.1 (Debian 4.0.1-5.4) (ultrotter@debian.org) 
> (gcc version 4.4.5 (Debian 4.4.5-8) ) Sat Sep  8 19:15:46 UTC 2012
> DOM0			Linux 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012 x86_64 
> GNU/Linux
> CPU 			
> processor       : 0
> vendor_id       : AuthenticAMD
> cpu family      : 16
> model           : 9
> model name      : AMD Opteron(tm) Processor 6174

Huh - so we have the problem on even different vendor CPUs (as
Mauro's are Intel ones). But I take it that you haven't see an event
yet with the debugging patch?

Plus, what's puzzling me a little too - before the occurrence of the
event on Mauro's system, I was under the impression that this
requires quite a bit of uptime. Yet the event he observed occurred
early on the second day after boot afaict.

Jan

Message #224 received at 599161@bugs.debian.org (full text, mbox, reply):

On 9 November 2012 10:47, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 09.11.12 at 10:05, <Philippe.Simonet@swisscom.com> wrote:
>> oops, excuse me, here is a description : I have the problem on 4 systems,
>> all with same hardware.
>> the problem occured  on each system, 1 time each 2 month in average. since
>> January 2012, I decided to reboot them all monthly,
>> and the clock jump occurred only once in February ...
>>
>> SYSTEM :              HP ProLiant DL385 G7, with 2 * AMD Processor 6174 (12 cores) = 24
>> cores, 16 GB MEMORY
>> XEN                            (XEN) Xen version 4.0.1 (Debian 4.0.1-5.4) (ultrotter@debian.org)
>> (gcc version 4.4.5 (Debian 4.4.5-8) ) Sat Sep  8 19:15:46 UTC 2012
>> DOM0                  Linux 2.6.32-5-xen-amd64 #1 SMP Sun Sep 23 13:49:30 UTC 2012 x86_64
>> GNU/Linux
>> CPU
>> processor       : 0
>> vendor_id       : AuthenticAMD
>> cpu family      : 16
>> model           : 9
>> model name      : AMD Opteron(tm) Processor 6174
>
> Huh - so we have the problem on even different vendor CPUs (as
> Mauro's are Intel ones). But I take it that you haven't see an event
> yet with the debugging patch?
>
> Plus, what's puzzling me a little too - before the occurrence of the
> event on Mauro's system, I was under the impression that this
> requires quite a bit of uptime. Yet the event he observed occurred
> early on the second day after boot afaict.

Before the patch clock jumps on my systems occurred about once or twice at week.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package src:linux-2.6. (Tue, 13 Nov 2012 10:18:03 GMT) (full text, mbox, link).

Acknowledgement sent to "Jan Beulich" <JBeulich@suse.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 13 Nov 2012 10:18:03 GMT) (full text, mbox, link).

Message #229 received at 599161@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

>>> On 09.11.12 at 10:05, <Philippe.Simonet@swisscom.com> wrote:

Since it looks like this got stalled again, attached is a slightly
extended version of Keir's debugging patch, allowing to rule out
any inconsistencies of the globals between the first and second
instances of the two invocations of __read_platform_stime().

Should the numbers printed turn out identical between the two
invocations and identical to the boot time determined values, then
I'm afraid I'm out of explanations as well as debugging suggestions.

Please remember to add "loglvl=all" to the hypervisor command
line.

The patch is against a 4.0.3 based tree I had still lying around, so
I hope it'll apply cleanly to your 4.0.1 based one.

Jan

[00-tsc-debug (application/octet-stream, attachment)]

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 26 Nov 2012 15:39:03 GMT) (full text, mbox, link).

Message #234 received at 599161@bugs.debian.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

On Mon, 2012-11-26 at 15:28 +0000, Jan Beulich wrote:
> >>> On 24.11.12 at 13:06, Mauro <mrsanna1@gmail.com> wrote:
> > Here is a clock jump, I'm using the patch:
> 
> And here's a 4.0.x version of the patch I just sent for -unstable.

Thanks Jan! CCing the Debian bug for posterity.

Mauro, Perhaps you could give this patch a try to confirm that it is
effective?

Ian.
-- 
Ian Campbell
Current Noise: Morbid Angel - Chapel Of Ghouls (Remix)

Paranoia is simply an optimistic outlook on life.

[x86-time-scale-asm.patch (text/x-patch, attachment)]

Added tag(s) fixed-upstream and patch. Request was from Axel Beckert <abe@debian.org> to control@bugs.debian.org. (Fri, 30 Nov 2012 00:09:05 GMT) (full text, mbox, link).

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599161; Package src:linux-2.6. (Sun, 09 Dec 2012 19:57:03 GMT) (full text, mbox, link).

Acknowledgement sent to <Philippe.Simonet@swisscom.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Sun, 09 Dec 2012 19:57:03 GMT) (full text, mbox, link).

Message #241 received at 599161@bugs.debian.org (full text, mbox, reply):

Hi Xen Developpers,

Mauro confirmed that  He didn't had any problem that last days with this patch installed,
what  would be done  that it would be 'officialized' / intergrated in the next version in Debian ?

Thanks and regards

Philippe




> -----Original Message-----
> From: Ian Campbell [mailto:ijc@hellion.org.uk]
> Sent: Monday, November 26, 2012 4:36 PM
> To: Jan Beulich
> Cc: Mauro; Simonet Philippe, ITS-OUS-OP-IFM-NW-IPE; Keir Fraser;
> 599161@bugs.debian.org
> Subject: Re: [Xen-devel] #599161: Xen debug patch for the "clock shifts by 50
> minutes" bug.
> 
> On Mon, 2012-11-26 at 15:28 +0000, Jan Beulich wrote:
> > >>> On 24.11.12 at 13:06, Mauro <mrsanna1@gmail.com> wrote:
> > > Here is a clock jump, I'm using the patch:
> >
> > And here's a 4.0.x version of the patch I just sent for -unstable.
> 
> Thanks Jan! CCing the Debian bug for posterity.
> 
> Mauro, Perhaps you could give this patch a try to confirm that it is effective?
> 
> Ian.
> --
> Ian Campbell
> Current Noise: Morbid Angel - Chapel Of Ghouls (Remix)
> 
> Paranoia is simply an optimistic outlook on life.

Acknowledgement sent to Ian Campbell <ijc@hellion.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 10 Dec 2012 10:54:03 GMT) (full text, mbox, link).

Message #246 received at 599161@bugs.debian.org (full text, mbox, reply):

reassign 599161 src:xen
thanks

On Sun, 2012-12-09 at 19:54 +0000, Philippe.Simonet@swisscom.com wrote:
> what  would be done  that it would be 'officialized' / intergrated in
> the next version in Debian ?

That is one for the Debian maintainers, not the upstream Xen maintainers
CCd here. AIUI the intention is to fix this soon.

I have reassigned the bug from the kernel to Xen, where it belongs.

Ian.
-- 
Ian Campbell
Current Noise: Weedeater - Weed Monkey

Those who do things in a noble spirit of self-sacrifice are to be avoided
at all costs.
		-- N. Alexander.

Bug reassigned from package 'src:linux-2.6' to 'src:xen'. Request was from Ian Campbell <ijc@hellion.org.uk> to control@bugs.debian.org. (Mon, 10 Dec 2012 10:54:05 GMT) (full text, mbox, link).

No longer marked as found in versions linux-2.6/2.6.32-21. Request was from Ian Campbell <ijc@hellion.org.uk> to control@bugs.debian.org. (Mon, 10 Dec 2012 10:54:05 GMT) (full text, mbox, link).

Reply sent to Bastian Blank <waldi@debian.org>:
You have taken responsibility. (Tue, 11 Dec 2012 19:21:08 GMT) (full text, mbox, link).

Notification sent to Mark Adams <mark@campbell-lange.net>:
Bug acknowledged by developer. (Tue, 11 Dec 2012 19:21:08 GMT) (full text, mbox, link).

Message #255 received at 599161-close@bugs.debian.org (full text, mbox, reply):

Source: xen
Source-Version: 4.1.3-7

We believe that the bug you reported is fixed in the latest version of
xen, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 599161@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Bastian Blank <waldi@debian.org> (supplier of updated xen package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.8
Date: Tue, 11 Dec 2012 18:54:59 +0100
Source: xen
Binary: xen-docs-4.1 libxen-4.1 libxenstore3.0 libxen-dev xenstore-utils libxen-ocaml libxen-ocaml-dev xen-utils-common xen-utils-4.1 xen-hypervisor-4.1-amd64 xen-system-amd64 xen-hypervisor-4.1-i386 xen-system-i386
Architecture: source amd64 all
Version: 4.1.3-7
Distribution: unstable
Urgency: low
Maintainer: Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>
Changed-By: Bastian Blank <waldi@debian.org>
Description: 
 libxen-4.1 - Public libs for Xen
 libxen-dev - Public headers and libs for Xen
 libxen-ocaml - OCaml libraries for controlling Xen
 libxen-ocaml-dev - OCaml libraries for controlling Xen (devel package)
 libxenstore3.0 - Xenstore communications library for Xen
 xen-docs-4.1 - Documentation for Xen
 xen-hypervisor-4.1-amd64 - Xen Hypervisor on AMD64
 xen-hypervisor-4.1-i386 - Xen Hypervisor on i386
 xen-system-amd64 - Xen System on AMD64 (meta-package)
 xen-system-i386 - Xen System on i386 (meta-package)
 xen-utils-4.1 - XEN administrative tools
 xen-utils-common - Xen administrative tools - common files
 xenstore-utils - Xenstore utilities for Xen
Closes: 599161 695056
Changes: 
 xen (4.1.3-7) unstable; urgency=low
 .
   * Fix clock jump due to incorrect annotated inline assembler.
     (closes: #599161)
   * Add support for XZ compressed Linux kernels to hypervisor and userspace
     based loaders, it is needed for any Linux kernels newer then Wheezy.
     (closes: #695056)
Checksums-Sha1: 
 7863b02c1ec958c62cf7a744fab555c974e8a71e 2389 xen_4.1.3-7.dsc
 5801a92ec7bedc4737f3da8e38bed1c55a668f5b 153180 xen_4.1.3-7.debian.tar.gz
 f1416a6a567779a46c3bb2031b34d5af599d959b 756926 xen-hypervisor-4.1-amd64_4.1.3-7_amd64.deb
 2936daf24bc9fedbd8b8c8278b7b4a2f59cccf52 17660 xen-system-amd64_4.1.3-7_amd64.deb
 adfa7fbdff2216a2fdd2f735912f380bfc6d7714 1171612 xen-docs-4.1_4.1.3-7_all.deb
 6d8515551bfadecd24e1256a06b6668872d6b6a8 78986 xen-utils-common_4.1.3-7_all.deb
 dee5f8f12ec592934c1f11472873102acdc6bb79 290546 libxen-dev_4.1.3-7_amd64.deb
 c766b40b36266248fc5ffc1bb74ec88b52756b9a 88338 libxen-ocaml-dev_4.1.3-7_amd64.deb
 b0e303b23b220e8c3d9136cfcb24d07aaa30cec9 28958 libxenstore3.0_4.1.3-7_amd64.deb
 c93e024b121f76a12465a8e075c3a7ec1571a212 139258 libxen-4.1_4.1.3-7_amd64.deb
 dc4729a94dcba5a7c7fd56e61a46613bbd13e7cf 62780 libxen-ocaml_4.1.3-7_amd64.deb
 91a04838d6c5549a446d1207e6ce9b6018158582 26284 xenstore-utils_4.1.3-7_amd64.deb
 487eaccaae88d5337363acb3f98e50fa388b57ff 1607676 xen-utils-4.1_4.1.3-7_amd64.deb
Checksums-Sha256: 
 910ecbadcfd655c32c9e24d191fdcdf25b2619257e1b05bce905f65e9663b01a 2389 xen_4.1.3-7.dsc
 f2515f13847f64c006daee9c7d6a8bc4fc70078e237efcbe45a5b083cea807ee 153180 xen_4.1.3-7.debian.tar.gz
 76dd14d4318a35d742ec886aa54e0b30d983f152dffe023db1291beb7db4455e 756926 xen-hypervisor-4.1-amd64_4.1.3-7_amd64.deb
 0e24e4f3505a719ff4eceac959d3e94811c0fa390fd9bb874b1012444cf8281d 17660 xen-system-amd64_4.1.3-7_amd64.deb
 c49c6a83afda94cd2555e2aedd393e22a457d4dc4b49ff9df3ee9f5bccdc96fc 1171612 xen-docs-4.1_4.1.3-7_all.deb
 cff8e1548c5dfd10c2bbfedad60ed7fae8d4b0e74b540658580c4545c011c660 78986 xen-utils-common_4.1.3-7_all.deb
 410e2273fb421c3cefb58ecc1b573571f1ba69c7bf0f70dc5c5b86db1d3ff606 290546 libxen-dev_4.1.3-7_amd64.deb
 93654784745fea6087a37c246c2ec374bc3e061a00cfdd701584dec8d8b630cf 88338 libxen-ocaml-dev_4.1.3-7_amd64.deb
 dc9903b595e7f4e933f94637d9bcda03ccc6d942d1fc2407746f7cf9577a8f2e 28958 libxenstore3.0_4.1.3-7_amd64.deb
 fa484adabe47d9cb26bf1d75c20c2b278b45a52555cd0377f609fd6b54f73673 139258 libxen-4.1_4.1.3-7_amd64.deb
 1525c542de919386cfe490282549d96b194c98dcbf95850c59bd1ab624b3fa3b 62780 libxen-ocaml_4.1.3-7_amd64.deb
 0e408d85185359204195d23abcf312cb4cf2609c6df4c16fa29829258dfc410e 26284 xenstore-utils_4.1.3-7_amd64.deb
 ea7f8d0a9cbb910dae63609fe474150aca2bcf3d371c4423ba748c2cf6a00526 1607676 xen-utils-4.1_4.1.3-7_amd64.deb
Files: 
 a44d33f1d8723dc8021bc05354ebda62 2389 kernel optional xen_4.1.3-7.dsc
 f75022770cc5563a1307561ec298cd30 153180 kernel optional xen_4.1.3-7.debian.tar.gz
 4dee8212e7b3359f4b743180a31dff4c 756926 kernel optional xen-hypervisor-4.1-amd64_4.1.3-7_amd64.deb
 6053cf588ecb46f439bdda5f7875ab93 17660 kernel optional xen-system-amd64_4.1.3-7_amd64.deb
 85426c9d42b5fd24376b7d4754f3e167 1171612 doc optional xen-docs-4.1_4.1.3-7_all.deb
 5418a20b9fdebc8dae885894089875f3 78986 kernel optional xen-utils-common_4.1.3-7_all.deb
 3ca54190ca203cf805553fa29d7a1cb2 290546 libdevel optional libxen-dev_4.1.3-7_amd64.deb
 da45a45ea0c827d2ef52d58d34749557 88338 ocaml optional libxen-ocaml-dev_4.1.3-7_amd64.deb
 f946c79fadcdb625120b54b5a9405b84 28958 libs optional libxenstore3.0_4.1.3-7_amd64.deb
 b0a158f3f352c6fb258924d21f0fa882 139258 libs optional libxen-4.1_4.1.3-7_amd64.deb
 7936c829277b683b27da61b3c47bd390 62780 ocaml optional libxen-ocaml_4.1.3-7_amd64.deb
 77851bd209e8ca5a32880846181a78aa 26284 admin optional xenstore-utils_4.1.3-7_amd64.deb
 409c49afcb80a5bdc1c2bdc3ff953f4d 1607676 kernel optional xen-utils-4.1_4.1.3-7_amd64.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAlDHhhoACgkQLkAIIn9ODhF0UQCfd630UELL2jMecoI5vN/IkDEA
7CUAn3laimP70OuODd9QMgAbP6iHQ/lr
=GArX
-----END PGP SIGNATURE-----

Reply sent to Bastian Blank <waldi@debian.org>:
You have taken responsibility. (Tue, 11 Dec 2012 19:21:09 GMT) (full text, mbox, link).

Notification sent to Antoine Beaupre <anarcat@debian.org>:
Bug acknowledged by developer. (Tue, 11 Dec 2012 19:21:09 GMT) (full text, mbox, link).

Marked as found in versions xen/4.0.1-5.5. Request was from Thomas Goirand <thomas@goirand.fr> to control@bugs.debian.org. (Thu, 20 Dec 2012 10:33:03 GMT) (full text, mbox, link).

Bug reopened Request was from Thomas Goirand <thomas@goirand.fr> to control@bugs.debian.org. (Thu, 20 Dec 2012 10:33:03 GMT) (full text, mbox, link).

No longer marked as fixed in versions xen/4.1.3-7. Request was from Thomas Goirand <thomas@goirand.fr> to control@bugs.debian.org. (Thu, 20 Dec 2012 10:33:04 GMT) (full text, mbox, link).

Marked as fixed in versions xen/4.1.3-7. Request was from Thomas Goirand <thomas@goirand.fr> to control@bugs.debian.org. (Thu, 20 Dec 2012 10:33:05 GMT) (full text, mbox, link).

Marked Bug as done Request was from Bastian Blank <waldi@debian.org> to control@bugs.debian.org. (Thu, 20 Dec 2012 10:54:13 GMT) (full text, mbox, link).

Notification sent to Mark Adams <mark@campbell-lange.net>:
Bug acknowledged by developer. (Thu, 20 Dec 2012 10:54:13 GMT) (full text, mbox, link).

Reply sent to Thomas Goirand <zigo@debian.org>:
You have taken responsibility. (Sun, 30 Dec 2012 21:03:14 GMT) (full text, mbox, link).

Notification sent to Mark Adams <mark@campbell-lange.net>:
Bug acknowledged by developer. (Sun, 30 Dec 2012 21:03:15 GMT) (full text, mbox, link).

Message #276 received at 599161-close@bugs.debian.org (full text, mbox, reply):

Source: xen
Source-Version: 4.0.1-5.6

We believe that the bug you reported is fixed in the latest version of
xen, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 599161@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Thomas Goirand <zigo@debian.org> (supplier of updated xen package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@debian.org)


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.8
Date: Wed, 26 Dec 2012 13:18:34 +0000
Source: xen
Binary: xen-docs-4.0 libxenstore3.0 libxen-dev xenstore-utils xen-utils-4.0 xen-hypervisor-4.0-amd64 xen-hypervisor-4.0-i386
Architecture: source all amd64
Version: 4.0.1-5.6
Distribution: stable-proposed-updates
Urgency: low
Maintainer: Debian Xen Team <pkg-xen-devel@lists.alioth.debian.org>
Changed-By: Thomas Goirand <zigo@debian.org>
Description: 
 libxen-dev - Public headers and libs for Xen
 libxenstore3.0 - Xenstore communications library for Xen
 xen-docs-4.0 - Documentation for Xen
 xen-hypervisor-4.0-amd64 - The Xen Hypervisor on AMD64
 xen-hypervisor-4.0-i386 - The Xen Hypervisor on i386
 xen-utils-4.0 - XEN administrative tools
 xenstore-utils - Xenstore utilities for Xen
Closes: 599161
Changes: 
 xen (4.0.1-5.6) stable-proposed-updates; urgency=low
 .
   * Non-maintainer upload, previously discussed with Guido.
   * Fixes Xen clock long standing issue, eg: fix scale_delta() inline assembly,
   causing domU offset and possibly leading to crashes (Closes: #599161). Thanks
   to Ian Campbell <ijc@hellion.org.uk> for forwarding the patch to the Debian
   BTS, and Jan Beulich <jbeulich@suse.com> for working on an upstream patch.
Checksums-Sha1: 
 ca6ca68cce7aa942f5ad2f51b22e3420e655e73e 1450 xen_4.0.1-5.6.dsc
 d0734ac8b09a3256c7db161f4b204aa8e3804593 70875 xen_4.0.1-5.6.debian.tar.gz
 83234840b252a4710a768ab6b4971260c8fde41f 1318404 xen-docs-4.0_4.0.1-5.6_all.deb
 76c21dc02a78825ac1c97d92f3a3ca3169767815 690564 xen-hypervisor-4.0-amd64_4.0.1-5.6_amd64.deb
 96d454e9be5d161606e222e0e3e0240de715f630 260952 libxen-dev_4.0.1-5.6_amd64.deb
 98790cb7d1924e5ba0097389aa8ead61250c6533 24746 libxenstore3.0_4.0.1-5.6_amd64.deb
 a875574d61ee8e1ced9387401b00cc496b7f15f9 1004874 xen-utils-4.0_4.0.1-5.6_amd64.deb
 e79351cf9dbae6c29c3bed69b35f1c8642000ad4 21360 xenstore-utils_4.0.1-5.6_amd64.deb
Checksums-Sha256: 
 d797899a6a1c2326a66cc80a807ea1b59d45d37924c3e71131db2a43242d69b7 1450 xen_4.0.1-5.6.dsc
 955e19f896b7596cf083a95ed4e2bf9cfd81a15384d27ce24c853227a416c4f1 70875 xen_4.0.1-5.6.debian.tar.gz
 fa34262f0a373cef22ca1aee64a39c041edcf14d5d2009b7f79b366257f265be 1318404 xen-docs-4.0_4.0.1-5.6_all.deb
 087c07617dc197221f61f8a200776708ea44eedbd917fe8425db6bc94e550efe 690564 xen-hypervisor-4.0-amd64_4.0.1-5.6_amd64.deb
 74689bc10f026ab739646efe695c5e788f02b85fca7ee689a2cec317aef9242d 260952 libxen-dev_4.0.1-5.6_amd64.deb
 51b8a1c1ecb0573540cfbcdc24edcfaf8e111ea4685a630301d5a1fb1f774294 24746 libxenstore3.0_4.0.1-5.6_amd64.deb
 77d8c662ab9715ef0b7ad5c3e078cf05f2267f191eece99e17089ad4bb34a81a 1004874 xen-utils-4.0_4.0.1-5.6_amd64.deb
 6079a0c70b3547eb02b3ad3c35c0d892d612efb423bb45977fc1c65100e75a52 21360 xenstore-utils_4.0.1-5.6_amd64.deb
Files: 
 64803fadb724e015517067667d9ea20e 1450 kernel optional xen_4.0.1-5.6.dsc
 2f2e797e2fb3af3ebac096b65318987d 70875 kernel optional xen_4.0.1-5.6.debian.tar.gz
 78265efbc4a4863ddea4cdeaf2b2e5be 1318404 doc optional xen-docs-4.0_4.0.1-5.6_all.deb
 ae58a44a048187a5ca69346915dcbead 690564 kernel optional xen-hypervisor-4.0-amd64_4.0.1-5.6_amd64.deb
 2796a321d20ad72e7f273d8a5aa14f91 260952 libdevel optional libxen-dev_4.0.1-5.6_amd64.deb
 f2093b7c82148f63b53bdf77b6541eef 24746 libs optional libxenstore3.0_4.0.1-5.6_amd64.deb
 d7839c041e69862932320d427d625b68 1004874 kernel optional xen-utils-4.0_4.0.1-5.6_amd64.deb
 89c412561393156cc1e43fb0f7702c87 21360 admin optional xenstore-utils_4.0.1-5.6_amd64.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAlDbAXsACgkQl4M9yZjvmklfKwCg3rU3qXXMd5mkMH6ZJqbjvB6Z
5WwAoMC56g14yA2B4iz3EqJXVnceO+BG
=kuem
-----END PGP SIGNATURE-----

Reply sent to Thomas Goirand <zigo@debian.org>:
You have taken responsibility. (Sun, 30 Dec 2012 21:03:15 GMT) (full text, mbox, link).

Notification sent to Antoine Beaupre <anarcat@debian.org>:
Bug acknowledged by developer. (Sun, 30 Dec 2012 21:03:15 GMT) (full text, mbox, link).

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Sun, 17 Mar 2013 07:25:38 GMT) (full text, mbox, link).

Send a report that this bug log contains spam.

Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Jan 25 21:03:03 2026; Machine Name: bembo

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Debian Bug report logs - #599161 xen-linux-system-2.6.32-5-xen-amd64: Clock moved forward 50 minutes, caused Xen HVM domU restart

Debian Bug report logs - #599161
xen-linux-system-2.6.32-5-xen-amd64: Clock moved forward 50 minutes, caused Xen HVM domU restart