Debian Bug report logs - #599816
Kernel panic on 36 and more encapsulated GRE tunnels

version graph

Package: linux-2.6; Maintainer for linux-2.6 is Debian Kernel Team <debian-kernel@lists.debian.org>;

Reported by: Beatrice Barbe <beatrice.barbe@gmail.com>

Date: Mon, 11 Oct 2010 14:57:05 UTC

Severity: normal

Fixed in version 2.6.37-1

Done: Moritz Muehlenhoff <jmm@inutil.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, unknown-package@qa.debian.org:
Bug#599816; Package debian gnu/linux. (Mon, 11 Oct 2010 14:57:08 GMT) Full text and rfc822 format available.

Acknowledgement sent to Beatrice Barbe <beatrice.barbe@gmail.com>:
New Bug report received and forwarded. Copy sent to unknown-package@qa.debian.org. (Mon, 11 Oct 2010 14:57:08 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Beatrice Barbe <beatrice.barbe@gmail.com>
To: submit@bugs.debian.org
Subject: Kernel panic on 36 and more encapsulated GRE tunnels
Date: Mon, 11 Oct 2010 16:53:23 +0200
[Message part 1 (text/plain, inline)]
Package: Debian GNU/Linux
Version: 5.0.6

When creating 36 or more GRE tunnels, with the script attached to the mail,
and sending a packet, I got a kernel panic.
The last line in syslog is: “GRE over IPv4 tunneling driver”

#tunels.sh 37

%ping -I 192.168.9.1 192.168.10.1
->kernel panic


-- 
Béatrice Barbe
Ingénieur INSA 2010
Spécialité Réseaux Télécoms
[Message part 2 (text/html, inline)]
[tunnels.sh (application/x-sh, attachment)]
[network-configuration.jpg (image/jpeg, attachment)]

Bug reassigned from package 'debian gnu/linux' to 'linux-2.6'. Request was from Mike Hommey <glandium@debian.org> to control@bugs.debian.org. (Tue, 12 Oct 2010 07:42:03 GMT) Full text and rfc822 format available.

Bug No longer marked as found in versions 5.0.6. Request was from Mike Hommey <glandium@debian.org> to control@bugs.debian.org. (Tue, 12 Oct 2010 07:42:04 GMT) Full text and rfc822 format available.

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599816; Package linux-2.6. (Wed, 13 Oct 2010 04:06:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ben Hutchings <ben@decadent.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 13 Oct 2010 04:06:03 GMT) Full text and rfc822 format available.

Message #14 received at 599816@bugs.debian.org (full text, mbox):

From: Ben Hutchings <ben@decadent.org.uk>
To: Beatrice Barbe <beatrice.barbe@gmail.com>
Cc: 599816@bugs.debian.org
Subject: Re: Kernel panic on 36 and more encapsulated GRE tunnels
Date: Wed, 13 Oct 2010 05:02:27 +0100
[Message part 1 (text/plain, inline)]
On Mon, 2010-10-11 at 16:53 +0200, Beatrice Barbe wrote:
> Package: Debian GNU/Linux
> Version: 5.0.6
> 
> When creating 36 or more GRE tunnels, with the script attached to the
> mail, and sending a packet, I got a kernel panic.
> The last line in syslog is: “GRE over IPv4 tunneling driver”
> 
> #tunels.sh 37
> 
> %ping -I 192.168.9.1 192.168.10.1
> ->kernel panic

Hmm, that's a weird bug.  I can reproduce it in Debian stable (Linux
2.6.26) though it is fixed in testing (Linux 2.6.32).

The panic messages I get are:

[   71.391683] BUG: scheduling while atomic: ping/2163/0xd7a8f000
[   71.392047] Pid: 2163, comm: ping Not tainted 2.6.26-2-686 #1
[   71.392047]  [<c02b86f2>] schedule+0x70/0x66f
[   71.392047]  [<c0126372>] sys_gettimeofday+0x27/0x53
[   71.392047]  [<c0103976>] work_resched+0x5/0x28
[   71.392047]  =======================
[   71.392047] BUG: unable to handle kernel paging request at 5c3b6400
[   71.392047] IP: [<c01187e9>] cpuacct_charge+0x29/0x34
[   71.392047] *pde = 00000000 
[   71.392047] Oops: 0000 [#1] SMP 
[   71.392047] Modules linked in: ip_gre loop snd_pcm snd_timer snd soundcore snd_page_alloc parport_pc parport serio_raw pcspkr psmouse i2c_piix4 button i2c_core evdev ext3 jbd mbcache ide_cd_mod cdrom ide_disk ata_generic libata scsi_mod dock floppy e1000 piix ide_pci_generic ide_core thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
[   71.392047] 
[   71.392047] Pid: 2163, comm: ping Not tainted (2.6.26-2-686 #1)
[   71.392047] EIP: 0060:[<c01187e9>] EFLAGS: 00010086 CPU: 0
[   71.392047] EIP is at cpuacct_charge+0x29/0x34
[   71.392047] EAX: df80b200 EBX: 00000000 ECX: 003ca62e EDX: df2eac80
[   71.392047] ESI: df89fa00 EDI: 08050440 EBP: df89fa00 ESP: de845f54
[   71.392047]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   71.392047] Process ping (pid: 2163, ti=de844000 task=df89fa00 task.ti=de844000)
[   71.392047] Stack: df89fa28 c1409ffc c011f6ee c1409fc0 00000040 c02b8999 00000040 00000003 
[   71.392047]        00000040 df89fb90 c1409fc0 00000000 00000003 0804f2c0 00000000 00000003 
[   71.392047]        0804f2c0 00000000 c0126372 4cb52b61 00000010 00000040 08050440 de844000 
[   71.392047] Call Trace:
[   71.392047]  [<c011f6ee>] put_prev_task_fair+0x17/0x37
[   71.392047]  [<c02b8999>] schedule+0x317/0x66f
[   71.392047]  [<c0126372>] sys_gettimeofday+0x27/0x53
[   71.392047]  [<c0103976>] work_resched+0x5/0x28
[   71.392047]  =======================
[   71.392047] Code: 14 c3 83 3d b0 10 35 c0 00 56 89 c6 53 89 cb 89 d1 74 20 8b 80 d8 03 00 00 8b 40 28 85 c0 74 13 8b 56 04 8b 40 0c 8b 52 10 f7 d0 <8b> 04 90 01 08 11 58 04 5b 5e c3 55 57 bf 3f 00 00 00 56 53 83 
[   71.392047] EIP: [<c01187e9>] cpuacct_charge+0x29/0x34 SS:ESP 0068:de845f54
[   71.392047] Kernel panic - not syncing: Fatal exception in interrupt

It looks like the send() call returns with a spinlock held.  I haven't
yet found the change between versions 2.6.26 and 2.6.32 that fixed this,
but I will keep looking.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599816; Package linux-2.6. (Thu, 14 Oct 2010 03:33:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ben Hutchings <ben@decadent.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 14 Oct 2010 03:33:03 GMT) Full text and rfc822 format available.

Message #19 received at 599816@bugs.debian.org (full text, mbox):

From: Ben Hutchings <ben@decadent.org.uk>
To: Beatrice Barbe <beatrice.barbe@gmail.com>
Cc: 599816@bugs.debian.org
Subject: Re: Kernel panic on 36 and more encapsulated GRE tunnels
Date: Thu, 14 Oct 2010 04:29:46 +0100
[Message part 1 (text/plain, inline)]
On Wed, 2010-10-13 at 05:02 +0100, Ben Hutchings wrote:
> On Mon, 2010-10-11 at 16:53 +0200, Beatrice Barbe wrote:
> > Package: Debian GNU/Linux
> > Version: 5.0.6
> > 
> > When creating 36 or more GRE tunnels, with the script attached to the
> > mail, and sending a packet, I got a kernel panic.
> > The last line in syslog is: “GRE over IPv4 tunneling driver”
> > 
> > #tunels.sh 37
> > 
> > %ping -I 192.168.9.1 192.168.10.1
> > ->kernel panic
> 
> Hmm, that's a weird bug.  I can reproduce it in Debian stable (Linux
> 2.6.26) though it is fixed in testing (Linux 2.6.32).

> It looks like the send() call returns with a spinlock held.  I haven't
> yet found the change between versions 2.6.26 and 2.6.32 that fixed this,
> but I will keep looking.

In fact this has not been fixed, and I am able to reproduce it with the
very latest version of Linux.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599816; Package linux-2.6. (Thu, 14 Oct 2010 04:03:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ben Hutchings <ben@decadent.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 14 Oct 2010 04:03:05 GMT) Full text and rfc822 format available.

Message #24 received at 599816@bugs.debian.org (full text, mbox):

From: Ben Hutchings <ben@decadent.org.uk>
To: netdev@vger.kernel.org
Cc: Beatrice Barbe <beatrice.barbe@gmail.com>, 599816@bugs.debian.org
Subject: Nested GRE locking bug
Date: Thu, 14 Oct 2010 05:00:42 +0100
[Message part 1 (text/plain, inline)]
Beatrice Barbe reported a reproducible crash after creating large
numbers of nested GRE tunnels and then pinging with the source address
forced.  I was able to reproduce this using net-2.6.  I'm attaching the
kernel config I used and a script to reproduce this based on the script
she provided.  The magic number of tunnels to create is apparently 37.

With lockdep enabled, I get the following output:

=============================================
[ INFO: possible recursive locking detected ]
2.6.36-rc7-00040-gb0057c5 #5
---------------------------------------------
ping/2199 is trying to acquire lock:
 (_xmit_IPGRE){+.....}, at: [<c1139968>] dev_queue_xmit+0x37e/0x454

but task is already holding lock:
 (_xmit_IPGRE){+.....}, at: [<c1139968>] dev_queue_xmit+0x37e/0x454

other info that might help us debug this:
4 locks held by ping/2199:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c1168c46>] raw_sendmsg+0x590/0x64c
 #1:  (rcu_read_lock_bh){.+....}, at: [<c11395ea>] dev_queue_xmit+0x0/0x454
 #2:  (_xmit_IPGRE){+.....}, at: [<c1139968>] dev_queue_xmit+0x37e/0x454
 #3:  (rcu_read_lock_bh){.+....}, at: [<c11395ea>] dev_queue_xmit+0x0/0x454

stack backtrace:
Pid: 2199, comm: ping Not tainted 2.6.36-rc7-00040-gb0057c5 #5
Call Trace:
 [<c1187b3c>] ? printk+0xf/0x13
 [<c103a942>] __lock_acquire+0xbda/0x1311
 [<c103a32b>] ? __lock_acquire+0x5c3/0x1311
 [<c103b0d2>] lock_acquire+0x59/0x77
 [<c1139968>] ? dev_queue_xmit+0x37e/0x454
 [<c11898b4>] _raw_spin_lock+0x1b/0x2a
 [<c1139968>] ? dev_queue_xmit+0x37e/0x454
 [<c1139968>] dev_queue_xmit+0x37e/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c1151382>] ? ip_append_data+0x536/0x7dc
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c1151851>] ? ip_generic_getfrag+0x0/0x8a
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c1150dff>] ip_push_pending_frames+0x260/0x2ad
 [<c1168c85>] raw_sendmsg+0x5cf/0x64c
 [<c11708ad>] inet_sendmsg+0x46/0x4f
 [<c112cea9>] sock_sendmsg+0xa4/0xba
 [<c105897d>] ? might_fault+0x35/0x6f
 [<c105897d>] ? might_fault+0x35/0x6f
 [<c1134614>] ? verify_iovec+0x3e/0x6a
 [<c112d2af>] sys_sendmsg+0x149/0x196
 [<c104b079>] ? unlock_page+0x3f/0x42
 [<c103b176>] ? lock_release_non_nested+0x86/0x221
 [<c105897d>] ? might_fault+0x35/0x6f
 [<c105897d>] ? might_fault+0x35/0x6f
 [<c112e287>] sys_socketcall+0x146/0x18b
 [<c10cb5c8>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c1189f5d>] syscall_call+0x7/0xb
------------[ cut here ]------------
WARNING: at kernel/softirq.c:143 local_bh_enable_ip+0x39/0xa5()
Hardware name: Bochs
Pid: 2199, comm: ping Not tainted 2.6.36-rc7-00040-gb0057c5 #5
Call Trace:
 [<c101a092>] warn_slowpath_common+0x60/0x75
 [<c101e534>] ? local_bh_enable_ip+0x39/0xa5
 [<c114b993>] ? rt_intern_hash+0x4da/0x4f9
 [<c101a0b6>] warn_slowpath_null+0xf/0x13
 [<c101e534>] local_bh_enable_ip+0x39/0xa5
 [<c1189d4e>] _raw_spin_unlock_bh+0x25/0x28
 [<c114b993>] rt_intern_hash+0x4da/0x4f9
 [<c114c1b8>] __ip_route_output_key+0x806/0x860
 [<c114c220>] ip_route_output_flow+0xe/0x3e
 [<c114c25c>] ip_route_output_key+0xc/0xe
 [<c11793d6>] ipgre_tunnel_xmit+0x1ac/0x757
 [<c1139968>] ? dev_queue_xmit+0x37e/0x454
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c11798e2>] ipgre_tunnel_xmit+0x6b8/0x757
 [<c114a188>] ? ip_rt_update_pmtu+0x0/0x60
 [<c11394fc>] dev_hard_start_xmit+0x33a/0x428
 [<c1139987>] dev_queue_xmit+0x39d/0x454
 [<c1151382>] ? ip_append_data+0x536/0x7dc
 [<c115292e>] ip_finish_output+0x29d/0x2c7
 [<c1151851>] ? ip_generic_getfrag+0x0/0x8a
 [<c11529e2>] ip_output+0x8a/0x8f
 [<c1150b9c>] ip_local_out+0x50/0x53
 [<c1150dff>] ip_push_pending_frames+0x260/0x2ad
 [<c1168c85>] raw_sendmsg+0x5cf/0x64c
 [<c11708ad>] inet_sendmsg+0x46/0x4f
 [<c112cea9>] sock_sendmsg+0xa4/0xba
 [<c105897d>] ? might_fault+0x35/0x6f
 [<c105897d>] ? might_fault+0x35/0x6f
 [<c1134614>] ? verify_iovec+0x3e/0x6a
 [<c112d2af>] sys_sendmsg+0x149/0x196
 [<c104b079>] ? unlock_page+0x3f/0x42
 [<c103b176>] ? lock_release_non_nested+0x86/0x221
 [<c105897d>] ? might_fault+0x35/0x6f
 [<c105897d>] ? might_fault+0x35/0x6f
 [<c112e287>] sys_socketcall+0x146/0x18b
 [<c10cb5c8>] ? trace_hardirqs_on_thunk+0xc/0x10
 [<c1189f5d>] syscall_call+0x7/0xb
 <IRQ> 

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
[.config (text/x-mpsub, attachment)]
[tunnels.sh (application/x-shellscript, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599816; Package linux-2.6. (Thu, 14 Oct 2010 04:15:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Dumazet <eric.dumazet@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Thu, 14 Oct 2010 04:15:04 GMT) Full text and rfc822 format available.

Message #29 received at 599816@bugs.debian.org (full text, mbox):

From: Eric Dumazet <eric.dumazet@gmail.com>
To: Ben Hutchings <ben@decadent.org.uk>
Cc: netdev@vger.kernel.org, Beatrice Barbe <beatrice.barbe@gmail.com>, 599816@bugs.debian.org
Subject: Re: Nested GRE locking bug
Date: Thu, 14 Oct 2010 06:11:59 +0200
Le jeudi 14 octobre 2010 à 05:00 +0100, Ben Hutchings a écrit :
> Beatrice Barbe reported a reproducible crash after creating large
> numbers of nested GRE tunnels and then pinging with the source address
> forced.  I was able to reproduce this using net-2.6.  I'm attaching the
> kernel config I used and a script to reproduce this based on the script
> she provided.  The magic number of tunnels to create is apparently 37.
> 
> With lockdep enabled, I get the following output:
> 

Thats a known problem, actually, called stack exhaustion :)

net-next-2.6 contains a fix for this, adding the perc_cpu xmit_recursion
limit. We might push it to net-2.6

Thanks

commit 745e20f1b626b1be4b100af5d4bf7b3439392f8f
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Wed Sep 29 13:23:09 2010 -0700

    net: add a recursion limit in xmit path
    
    As tunnel devices are going to be lockless, we need to make sure a
    misconfigured machine wont enter an infinite loop.
    
    Add a percpu variable, and limit to three the number of stacked xmits.
    
    Reported-by: Jesse Gross <jesse@nicira.com>
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/dev.c b/net/core/dev.c
index 48ad47f..50dacca 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2177,6 +2177,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	return rc;
 }
 
+static DEFINE_PER_CPU(int, xmit_recursion);
+#define RECURSION_LIMIT 3
+
 /**
  *	dev_queue_xmit - transmit a buffer
  *	@skb: buffer to transmit
@@ -2242,10 +2245,15 @@ int dev_queue_xmit(struct sk_buff *skb)
 
 		if (txq->xmit_lock_owner != cpu) {
 
+			if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT)
+				goto recursion_alert;
+
 			HARD_TX_LOCK(dev, txq, cpu);
 
 			if (!netif_tx_queue_stopped(txq)) {
+				__this_cpu_inc(xmit_recursion);
 				rc = dev_hard_start_xmit(skb, dev, txq);
+				__this_cpu_dec(xmit_recursion);
 				if (dev_xmit_complete(rc)) {
 					HARD_TX_UNLOCK(dev, txq);
 					goto out;
@@ -2257,7 +2265,9 @@ int dev_queue_xmit(struct sk_buff *skb)
 				       "queue packet!\n", dev->name);
 		} else {
 			/* Recursion is detected! It is possible,
-			 * unfortunately */
+			 * unfortunately
+			 */
+recursion_alert:
 			if (net_ratelimit())
 				printk(KERN_CRIT "Dead loop on virtual device "
 				       "%s, fix it urgently!\n", dev->name);






Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599816; Package linux-2.6. (Tue, 19 Oct 2010 09:03:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Miller <davem@davemloft.net>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 19 Oct 2010 09:03:03 GMT) Full text and rfc822 format available.

Message #34 received at 599816@bugs.debian.org (full text, mbox):

From: David Miller <davem@davemloft.net>
To: eric.dumazet@gmail.com
Cc: ben@decadent.org.uk, netdev@vger.kernel.org, beatrice.barbe@gmail.com, 599816@bugs.debian.org
Subject: Re: Nested GRE locking bug
Date: Tue, 19 Oct 2010 01:53:03 -0700 (PDT)
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 14 Oct 2010 06:11:59 +0200

> net-next-2.6 contains a fix for this, adding the perc_cpu
> xmit_recursion limit. We might push it to net-2.6

We need to think a bit more about this.

We are essentially now saying that one can only configure
tunnels 3 levels deep, and no more.

I can guarentee you someone out there uses at least 4,
perhaps more.

And those people will be broken by the new limit.

So putting this into net-2.6 with such a low limit will
be quite dangerous.




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599816; Package linux-2.6. (Tue, 19 Oct 2010 09:03:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Dumazet <eric.dumazet@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Tue, 19 Oct 2010 09:03:07 GMT) Full text and rfc822 format available.

Message #39 received at 599816@bugs.debian.org (full text, mbox):

From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: ben@decadent.org.uk, netdev@vger.kernel.org, beatrice.barbe@gmail.com, 599816@bugs.debian.org
Subject: Re: Nested GRE locking bug
Date: Tue, 19 Oct 2010 11:02:36 +0200
Le mardi 19 octobre 2010 à 01:53 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 14 Oct 2010 06:11:59 +0200
> 
> > net-next-2.6 contains a fix for this, adding the perc_cpu
> > xmit_recursion limit. We might push it to net-2.6
> 
> We need to think a bit more about this.
> 
> We are essentially now saying that one can only configure
> tunnels 3 levels deep, and no more.
> 
> I can guarentee you someone out there uses at least 4,
> perhaps more.
> 
> And those people will be broken by the new limit.
> 
> So putting this into net-2.6 with such a low limit will
> be quite dangerous.

Well limit is actually 4, but I get your point ;)







Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599816; Package linux-2.6. (Mon, 25 Oct 2010 19:57:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Miller <davem@davemloft.net>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 25 Oct 2010 19:57:03 GMT) Full text and rfc822 format available.

Message #44 received at 599816@bugs.debian.org (full text, mbox):

From: David Miller <davem@davemloft.net>
To: eric.dumazet@gmail.com
Cc: ben@decadent.org.uk, netdev@vger.kernel.org, beatrice.barbe@gmail.com, 599816@bugs.debian.org
Subject: Re: Nested GRE locking bug
Date: Mon, 25 Oct 2010 12:53:47 -0700 (PDT)
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 19 Oct 2010 11:02:36 +0200

> Le mardi 19 octobre 2010 à 01:53 -0700, David Miller a écrit :
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Thu, 14 Oct 2010 06:11:59 +0200
>> 
>> > net-next-2.6 contains a fix for this, adding the perc_cpu
>> > xmit_recursion limit. We might push it to net-2.6
>> 
>> We need to think a bit more about this.
>> 
>> We are essentially now saying that one can only configure
>> tunnels 3 levels deep, and no more.
>> 
>> I can guarentee you someone out there uses at least 4,
>> perhaps more.
>> 
>> And those people will be broken by the new limit.
>> 
>> So putting this into net-2.6 with such a low limit will
>> be quite dangerous.
> 
> Well limit is actually 4, but I get your point ;)

I'll commit the following to upstream, and submit a combined
patch to -stable.

--------------------
net: Increase xmit RECURSION_LIMIT to 10.

Three is definitely too low, and we know from reports that GRE tunnels
stacked as deeply as 37 levels cause stack overflows, so pick some
reasonable value between those two.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/dev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 78b5a89..2c7da3a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2213,7 +2213,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 }
 
 static DEFINE_PER_CPU(int, xmit_recursion);
-#define RECURSION_LIMIT 3
+#define RECURSION_LIMIT 10
 
 /**
  *	dev_queue_xmit - transmit a buffer
-- 
1.7.3.2





Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#599816; Package linux-2.6. (Mon, 25 Oct 2010 20:12:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Dumazet <eric.dumazet@gmail.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 25 Oct 2010 20:12:03 GMT) Full text and rfc822 format available.

Message #49 received at 599816@bugs.debian.org (full text, mbox):

From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: ben@decadent.org.uk, netdev@vger.kernel.org, beatrice.barbe@gmail.com, 599816@bugs.debian.org
Subject: Re: Nested GRE locking bug
Date: Mon, 25 Oct 2010 22:08:25 +0200
Le lundi 25 octobre 2010 à 12:53 -0700, David Miller a écrit :

> I'll commit the following to upstream, and submit a combined
> patch to -stable.
> 
> --------------------
> net: Increase xmit RECURSION_LIMIT to 10.
> 
> Three is definitely too low, and we know from reports that GRE tunnels
> stacked as deeply as 37 levels cause stack overflows, so pick some
> reasonable value between those two.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/core/dev.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 78b5a89..2c7da3a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2213,7 +2213,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>  }
>  
>  static DEFINE_PER_CPU(int, xmit_recursion);
> -#define RECURSION_LIMIT 3
> +#define RECURSION_LIMIT 10
>  
>  /**
>   *	dev_queue_xmit - transmit a buffer


Perfect, thanks !







Reply sent to Moritz Muehlenhoff <jmm@inutil.org>:
You have taken responsibility. (Fri, 07 Jun 2013 18:24:09 GMT) Full text and rfc822 format available.

Notification sent to Beatrice Barbe <beatrice.barbe@gmail.com>:
Bug acknowledged by developer. (Fri, 07 Jun 2013 18:24:09 GMT) Full text and rfc822 format available.

Message #54 received at 599816-done@bugs.debian.org (full text, mbox):

From: Moritz Muehlenhoff <jmm@inutil.org>
To: Ben Hutchings <ben@decadent.org.uk>
Cc: Beatrice Barbe <beatrice.barbe@gmail.com>, 599816-done@bugs.debian.org
Subject: Re: Kernel panic on 36 and more encapsulated GRE tunnels
Date: Fri, 7 Jun 2013 20:20:06 +0200
Version: 2.6.37-1

On Thu, Oct 14, 2010 at 04:29:46AM +0100, Ben Hutchings wrote:
> On Wed, 2010-10-13 at 05:02 +0100, Ben Hutchings wrote:
> > On Mon, 2010-10-11 at 16:53 +0200, Beatrice Barbe wrote:
> > > Package: Debian GNU/Linux
> > > Version: 5.0.6
> > > 
> > > When creating 36 or more GRE tunnels, with the script attached to the
> > > mail, and sending a packet, I got a kernel panic.
> > > The last line in syslog is: “GRE over IPv4 tunneling driver”
> > > 
> > > #tunels.sh 37
> > > 
> > > %ping -I 192.168.9.1 192.168.10.1
> > > ->kernel panic
> > 
> > Hmm, that's a weird bug.  I can reproduce it in Debian stable (Linux
> > 2.6.26) though it is fixed in testing (Linux 2.6.32).
> 
> > It looks like the send() call returns with a spinlock held.  I haven't
> > yet found the change between versions 2.6.26 and 2.6.32 that fixed this,
> > but I will keep looking.
> 
> In fact this has not been fixed, and I am able to reproduce it with the
> very latest version of Linux.

Fixed in 2.6.37 with 11a766ce915fc9f8663714eac6d59239388534ea

Cheers,
        Moritz




Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Sat, 06 Jul 2013 07:26:37 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sat Apr 19 14:41:09 2014; Machine Name: buxtehude.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.