Debian Bug report logs - #647185
linux-2.6: kernel null pointer dereference while adding SAN path

version graph

Package: linux-2.6; Maintainer for linux-2.6 is (unknown);

Reported by: Bernd Zeimetz <b.zeimetz@conova.com>

Date: Mon, 31 Oct 2011 13:57:01 UTC

Severity: normal

Found in version 2.6.32-38

Done: Moritz Mühlenhoff <jmm@inutil.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#647185; Package linux-2.6. (Mon, 31 Oct 2011 13:57:04 GMT) (full text, mbox, link).


Acknowledgement sent to Bernd Zeimetz <b.zeimetz@conova.com>:
New Bug report received and forwarded. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Mon, 31 Oct 2011 13:57:05 GMT) (full text, mbox, link).


Message #5 received at submit@bugs.debian.org (full text, mbox, reply):

From: Bernd Zeimetz <b.zeimetz@conova.com>
To: <submit@bugs.debian.org>
Subject: linux-2.6: kernel null pointer dereference while adding SAN path
Date: Mon, 31 Oct 2011 14:35:47 +0100
Package: linux-2.6
Version: 2.6.32-38

Hi,

removing paths to our SAN and adding them back results in

[  951.569561] device-mapper: table: 253:2: sde too small for target: start=0, len=140465493850188, dev_size=627107840
[  951.571750] BUG: unable to handle kernel NULL pointer dereference at (null)
[  951.571876] IP: [<(null)>] (null)
[  951.571961] PGD 6500c1067 PUD 650135067 PMD 0 
[  951.578673] Oops: 0010 [#1] SMP 
[  951.578788] last sysfs file: /sys/devices/virtual/block/dm-3/uevent
[  951.578846] CPU 16 
[  951.578928] Modules linked in: 8021q garp stp ext4 jbd2 crc16 dm_round_robin dm_multipath scsi_dh bonding ipmi_devintf ipmi_si ipmi_msghandler ohci_hcd radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core snd_pcm snd_timer snd soundcore snd_page_alloc hpilo hpwdt joydev pcspkr psmouse evdev serio_raw power_meter container processor button ext3 jbd mbcache dm_mod raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod sd_mod crc_t10dif sg usbhid sr_mod hid cdrom ata_generic hpsa ata_piix thermal uhci_hcd cciss ehci_hcd qla2xxx scsi_transport_fc libata scsi_tgt bnx2 usbcore qlcnic nls_base scsi_mod thermal_sys [last unloaded: scsi_wait_scan]
[  951.581772] Pid: 5801, comm: blkid Not tainted 2.6.32-5-amd64 #1 ProLiant DL380 G7
[  951.581845] RIP: 0010:[<0000000000000000>]  [<(null)>] (null)
[  951.581934] RSP: 0018:ffff88071b9c5b80  EFLAGS: 00010006
[  951.581989] RAX: ffff880e1ad3e880 RBX: ffff880e1a4888d0 RCX: 0000000000000000
[  951.582054] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff880e1a4888d0
[  951.582116] RBP: ffff880e1a4888d0 R08: ffff880719cb33e8 R09: ffff880719f12840
[  951.582175] R10: 0000000100027c26 R11: ffff88065b555500 R12: ffff880e1a4888d0
[  951.582234] R13: 0000000000000002 R14: ffff88071bcc1d60 R15: ffff88071bcc1c44
[  951.582297] FS:  00007f5c1037d740(0000) GS:ffff88001a500000(0000) knlGS:0000000000000000
[  951.582372] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  951.582429] CR2: 0000000000000000 CR3: 000000071b6d2000 CR4: 00000000000006e0
[  951.582488] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  951.582546] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  951.582606] Process blkid (pid: 5801, threadinfo ffff88071b9c4000, task ffff88071a31bf90)
[  951.582680] Stack:
[  951.582729]  ffffffff8117629e ffff88071bbd7dc8 ffffffff81176c40 ffff88071bbd7dc8
[  951.582885] <0> ffff88071bbd7dc8 ffff880e1a4888d0 0000000000000096 ffff88071bcc1c00
[  951.583118] <0> ffffffff8117dec9 ffff88071bbd7dc8 ffffc9000c8da040 ffff88071a2fac10
[  951.583397] Call Trace:
[  951.583452]  [<ffffffff8117629e>] ? elv_drain_elevator+0x16/0x5a
[  951.583510]  [<ffffffff81176c40>] ? elv_insert+0x91/0x260
[  951.583568]  [<ffffffff8117dec9>] ? blk_insert_cloned_request+0x4f/0x67
[  951.583630]  [<ffffffffa022d90f>] ? dm_dispatch_request+0x33/0x59 [dm_mod]
[  951.583691]  [<ffffffffa022eedb>] ? dm_request_fn+0x121/0x1a2 [dm_mod]
[  951.583752]  [<ffffffff810b43e3>] ? sync_page_killable+0x0/0x2f
[  951.583810]  [<ffffffff8117f07a>] ? generic_unplug_device+0x21/0x34
[  951.583870]  [<ffffffffa022dac8>] ? dm_unplug_all+0x33/0x4c [dm_mod]
[  951.583928]  [<ffffffff810b43d9>] ? sync_page+0x3c/0x46
[  951.583984]  [<ffffffff810b43ec>] ? sync_page_killable+0x9/0x2f
[  951.584043]  [<ffffffff812fb80a>] ? __wait_on_bit_lock+0x3f/0x84
[  951.584101]  [<ffffffff810b42e8>] ? __lock_page_killable+0x5d/0x63
[  951.584160]  [<ffffffff81064fc0>] ? wake_bit_function+0x0/0x23
[  951.584217]  [<ffffffff810b42f7>] ? lock_page_killable+0x9/0x1f
[  951.584274]  [<ffffffff810b5917>] ? generic_file_aio_read+0x363/0x536
[  951.584334]  [<ffffffff810eed05>] ? do_sync_read+0xce/0x113
[  951.584391]  [<ffffffff81064f92>] ? autoremove_wake_function+0x0/0x2e
[  951.584451]  [<ffffffff810ccd36>] ? handle_mm_fault+0x3b8/0x80f
[  951.584508]  [<ffffffff810ef728>] ? vfs_read+0xa6/0xff
[  951.584564]  [<ffffffff810ef83d>] ? sys_read+0x45/0x6e
[  951.584621]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[  951.584677] Code:  Bad RIP value.
[  951.584795] RIP  [<(null)>] (null)
[  951.584879]  RSP <ffff88071b9c5b80>
[  951.584932] CR2: 0000000000000000
[  951.584985] ---[ end trace 71dd7f009a29d813 ]---


As I'm adding back the old paths pretty much at the same time it seems
for me that blkid wants to access ond of the devices I've just removed.
But that should not result in a NULL pointer dereference, also it
should not render the access to the LUN faulty, completely forgetting
about the kind of hardware behind it.

lun_alias (00009800064700000684a656930380000) dm-1 ,
size=4.9T features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- #:#:#:# -   #:#   active faulty running
| `- #:#:#:# -   #:#   active faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- #:#:#:# -   #:#   active faulty running
  `- #:#:#:# -   #:#   active faulty running


The expected output of multipath -ll would be more like

lun_alias (00009800064700000684a656930380000) dm-1 NETAPP,LUN
size=299G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=8 status=active
| |- 1:0:2:0 sdk 8:160 active ready running
| `- 0:0:2:0 sdm 8:192 active ready running
`-+- policy='round-robin 0' prio=2 status=enabled
  |- 0:0:3:0 sdn 8:208 active ready running
  `- 1:0:3:0 sdl 8:176 active ready running


Cheers,

Bernd


-------------------------------------------------
Bernd Zeimetz
Systems Engineer

conova communications GmbH

web   |  www.conova.com
mail  |  b.zeimetz@conova.com

ZENTRALE SALZBURG
Karolingerstraße 36A
A - 5020 Salzburg

tel   |  +43/(0)662 2200-313
fax   |  +43/(0)662 2200-209
------------------------------------------------
	




Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#647185; Package linux-2.6. (Wed, 02 Nov 2011 05:00:03 GMT) (full text, mbox, link).


Acknowledgement sent to Ben Hutchings <ben@decadent.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 02 Nov 2011 05:00:03 GMT) (full text, mbox, link).


Message #10 received at 647185@bugs.debian.org (full text, mbox, reply):

From: Ben Hutchings <ben@decadent.org.uk>
To: Bernd Zeimetz <b.zeimetz@conova.com>, 647185@bugs.debian.org
Subject: Re: Bug#647185: linux-2.6: kernel null pointer dereference while adding SAN path
Date: Wed, 02 Nov 2011 04:56:42 +0000
[Message part 1 (text/plain, inline)]
On Mon, 2011-10-31 at 14:35 +0100, Bernd Zeimetz wrote:
> Package: linux-2.6
> Version: 2.6.32-38
> 
> Hi,
> 
> removing paths to our SAN and adding them back results in
[...]

Does the attached patch help?  Instructions for building a patched
kernel can be found at:

http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official

Ben.

-- 
Ben Hutchings
Sturgeon's Law: Ninety percent of everything is crap.
[0001-dm-prevent-access-to-md-being-deleted.patch (text/x-patch, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#647185; Package linux-2.6. (Wed, 02 Nov 2011 05:06:03 GMT) (full text, mbox, link).


Acknowledgement sent to Ben Hutchings <ben@decadent.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 02 Nov 2011 05:06:03 GMT) (full text, mbox, link).


Message #15 received at 647185@bugs.debian.org (full text, mbox, reply):

From: Ben Hutchings <ben@decadent.org.uk>
To: Bernd Zeimetz <b.zeimetz@conova.com>
Cc: 647185@bugs.debian.org
Subject: Re: Bug#647185: linux-2.6: kernel null pointer dereference while adding SAN path
Date: Wed, 02 Nov 2011 05:02:43 +0000
[Message part 1 (text/plain, inline)]
On Wed, 2011-11-02 at 04:56 +0000, Ben Hutchings wrote:
> On Mon, 2011-10-31 at 14:35 +0100, Bernd Zeimetz wrote:
> > Package: linux-2.6
> > Version: 2.6.32-38
> > 
> > Hi,
> > 
> > removing paths to our SAN and adding them back results in
> [...]
> 
> Does the attached patch help?  Instructions for building a patched
> kernel can be found at:
> 
> http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official

Sorry, you'll need this patch as well.

Ben.

-- 
Ben Hutchings
Sturgeon's Law: Ninety percent of everything is crap.
[0001-dm-add-dm_deleting_md-function.patch (text/x-patch, attachment)]
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#647185; Package linux-2.6. (Wed, 02 Nov 2011 14:03:06 GMT) (full text, mbox, link).


Acknowledgement sent to Bernd Zeimetz <b.zeimetz@conova.com>:
Extra info received and forwarded to list. Copy sent to Debian Kernel Team <debian-kernel@lists.debian.org>. (Wed, 02 Nov 2011 14:03:07 GMT) (full text, mbox, link).


Message #20 received at 647185@bugs.debian.org (full text, mbox, reply):

From: Bernd Zeimetz <b.zeimetz@conova.com>
To: Ben Hutchings <ben@decadent.org.uk>
Cc: <647185@bugs.debian.org>
Subject: Re: Bug#647185: linux-2.6: kernel null pointer dereference while adding SAN path
Date: Wed, 2 Nov 2011 14:34:07 +0100
Hi Ben!


>>> removing paths to our SAN and adding them back results in
>> [...]
>>
>> Does the attached patch help?  Instructions for building a patched
>> kernel can be found at:
>>
>> http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official
> 
> Sorry, you'll need this patch as well.

thanks for the patches! After applying them we run into the following oops instead.
Please note that
- this only seems to happen when there is a partition on the LUN, using LVM on the
  DM device directly doesn't seem to trigger that bug
- you'll have to fix the vserver patch when the two dm patches are applied
  (not the issue here, but wanted to mention it before you fall about it)


[ 2000.379681] device-mapper: multipath: Failing path 8:32.
[ 2000.380022] device-mapper: multipath: Failing path 8:48.
[ 2000.381502] device-mapper: table: 254:2: multipath: error getting device
[ 2000.381533] device-mapper: ioctl: error adding target to table
[ 2000.382355] device-mapper: table: 254:2: multipath: error getting device
[ 2000.382385] device-mapper: ioctl: error adding target to table
[ 2000.411143] general protection fault: 0000 [#1] SMP 
[ 2000.411175] last sysfs file: /sys/devices/pci0000:00/0000:00:07.0/0000:0e:00.0/host0/rport-0:0-3/target0:0:3/0:0:3:0/block/sdd/uevent
[ 2000.411229] CPU 4 
[ 2000.411251] Modules linked in: 8021q garp stp ext4 jbd2 crc16 dm_round_robin dm_multipath scsi_dh bonding ipmi_devintf ipmi_si ipmi_msghandler ohci_hcd snd_pcm snd_timer radeon snd ttm soundcore drm_kms_helper drm i2c_algo_bit i2c_core hpilo snd_page_alloc hpwdt joydev pcspkr evdev button power_meter processor container psmouse serio_raw ext3 jbd mbcache dm_mod sd_mod crc_t10dif sg sr_mod cdrom ata_generic usbhid hid qla2xxx hpsa scsi_transport_fc uhci_hcd thermal scsi_tgt ata_piix ehci_hcd libata bnx2 qlcnic usbcore cciss nls_base scsi_mod thermal_sys [last unloaded: scsi_wait_scan]
[ 2000.411579] Pid: 8402, comm: multipath Not tainted 2.6.32-5-amd64 #1 ProLiant DL380 G7
[ 2000.411623] RIP: 0010:[<ffffffff8117629b>]  [<ffffffff8117629b>] elv_drain_elevator+0x13/0x5a
[ 2000.411674] RSP: 0018:ffff880e1b2cfd18  EFLAGS: 00010002
[ 2000.411700] RAX: ffff880719b0cd80 RBX: ffff880719a291a0 RCX: 0000000000000000
[ 2000.411729] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff880719a291a0
[ 2000.411758] RBP: ffff880719a291a0 R08: ffff88071a65be70 R09: ffff88071a701840
[ 2000.411787] R10: 0000000100067c84 R11: ffff880713a8a780 R12: ffff880719a291a0
[ 2000.411816] R13: 0000000000000002 R14: ffff880719707160 R15: ffff880719707044
[ 2000.411845] FS:  00007f3b1d07a7a0(0000) GS:ffff88001a440000(0000) knlGS:0000000000000000
[ 2000.411889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2000.411916] CR2: 00000000025ea210 CR3: 0000000e1a53f000 CR4: 00000000000006e0
[ 2000.411945] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2000.411974] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 2000.418573] Process multipath (pid: 8402, threadinfo ffff880e1b2ce000, task ffff880e1b93e9f0)
[ 2000.418618] Stack:
[ 2000.418637]  ffff880248cc0018 ffffffff81176c40 ffff880248cc0018 ffff880248cc0018
[ 2000.418674] <0> ffff880719a291a0 0000000000000096 ffff880719707000 ffffffff8117dec9
[ 2000.418726] <0> ffff880248cc0018 ffffc9000ca8b040 ffff880719a2b4e0 ffffffffa019492b
[ 2000.418795] Call Trace:
[ 2000.418818]  [<ffffffff81176c40>] ? elv_insert+0x91/0x260
[ 2000.418847]  [<ffffffff8117dec9>] ? blk_insert_cloned_request+0x4f/0x67
[ 2000.418879]  [<ffffffffa019492b>] ? dm_dispatch_request+0x33/0x59 [dm_mod]
[ 2000.418912]  [<ffffffffa0195ef7>] ? dm_request_fn+0x121/0x1a2 [dm_mod]
[ 2000.418941]  [<ffffffff8117eef6>] ? __blk_run_queue+0x35/0x66
[ 2000.418970]  [<ffffffffa0194a43>] ? dm_resume+0xb5/0x123 [dm_mod]
[ 2000.419001]  [<ffffffffa0199071>] ? dev_suspend+0x0/0x196 [dm_mod]
[ 2000.419032]  [<ffffffffa01991d0>] ? dev_suspend+0x15f/0x196 [dm_mod]
[ 2000.419063]  [<ffffffffa0199c24>] ? ctl_ioctl+0x1c6/0x20e [dm_mod]
[ 2000.419092]  [<ffffffffa0199c7a>] ? dm_ctl_ioctl+0xe/0x12 [dm_mod]
[ 2000.419124]  [<ffffffff810fab66>] ? vfs_ioctl+0x21/0x6c
[ 2000.419150]  [<ffffffff810fb0b4>] ? do_vfs_ioctl+0x48d/0x4cb
[ 2000.419178]  [<ffffffff810d066d>] ? remove_vma+0x6b/0x72
[ 2000.419205]  [<ffffffff810d1782>] ? do_munmap+0x307/0x329
[ 2000.419231]  [<ffffffff810fb143>] ? sys_ioctl+0x51/0x70
[ 2000.419258]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
[ 2000.419285] Code: 41 0f 18 09 75 bb 48 8b 02 48 89 70 08 48 89 06 48 89 56 08 48 89 32 c3 53 48 89 fb 48 8b 43 18 be 01 00 00 00 48 89 df 48 8b 00 <ff> 50 20 85 c0 75 ea 8b 8b b0 03 00 00 85 c9 74 34 8b 15 ca ea 
[ 2000.419478] RIP  [<ffffffff8117629b>] elv_drain_elevator+0x13/0x5a
[ 2000.419507]  RSP <ffff880e1b2cfd18>
[ 2000.419759] ---[ end trace cff8452e221a0978 ]---
[ 2130.432222] qla2xxx 0000:0e:00.0: LIP reset occurred (f700).
[ 2130.647148] qla2xxx 0000:0e:00.1: LOOP DOWN detected (2 3 0).
[ 2130.808220] qla2xxx 0000:0e:00.0: LIP occurred (f700).
[ 2130.808342] qla2xxx 0000:0e:00.0: LIP reset occurred (f7f7).
[ 2130.857976] qla2xxx 0000:0e:00.0: LOOP UP detected (8 Gbps).


Cheers,

Bernd

-- 
Mit freundlichen Grüßen,

-------------------------------------------------
Bernd Zeimetz
Systems Engineer

conova communications GmbH

web   |  www.conova.com
mail  |  b.zeimetz@conova.com

ZENTRALE SALZBURG
Karolingerstraße 36A
A - 5020 Salzburg

tel   |  +43/(0)662 2200-313
fax   |  +43/(0)662 2200-209
------------------------------------------------




Reply sent to Moritz Mühlenhoff <jmm@inutil.org>:
You have taken responsibility. (Fri, 31 May 2013 16:06:29 GMT) (full text, mbox, link).


Notification sent to Bernd Zeimetz <b.zeimetz@conova.com>:
Bug acknowledged by developer. (Fri, 31 May 2013 16:06:29 GMT) (full text, mbox, link).


Message #25 received at 647185-done@bugs.debian.org (full text, mbox, reply):

From: Moritz Mühlenhoff <jmm@inutil.org>
To: 647723-done@bugs.debian.org, 647471-done@bugs.debian.org, 647185-done@bugs.debian.org, 645301-done@bugs.debian.org, 644736-done@bugs.debian.org, 644677-done@bugs.debian.org, 646081-done@bugs.debian.org, 646025-done@bugs.debian.org
Subject: Closing
Date: Fri, 31 May 2013 18:05:04 +0200
Hi,
your bug has been filed against the "linux-2.6" source package and was filed for
a kernel older than the recently released Debian 7.0 / Wheezy with a severity
less than important.

We don't have the ressources to reproduce the complete backlog of all older kernel
bugs, so we're closing this bug for now. If you can reproduce the bug with Debian Wheezy
or a more recent kernel from testing or unstable, please reopen the bug by sending
a mail to control@bugs.debian.org with the following three commands included in the
mail:

reopen BUGNUMBER
reassign BUGNUMBER src:linux
thanks

Cheers,
        Moritz



Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Sat, 29 Jun 2013 07:41:15 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Fri Jan 12 19:38:10 2018; Machine Name: beach

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.