Debian Bug report logs - #593883
linux-2.6: race reading /proc/mounts causes umount failure and potential for serious dataloss

Package: linux-2.6; Maintainer for linux-2.6 is Debian Kernel Team <debian-kernel@lists.debian.org>;

Reported by: Greg Price <price@ksplice.com>

Date: Wed, 18 Aug 2010 21:15:01 UTC

Severity: serious

Done: Ben Hutchings <ben@decadent.org.uk>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian buildd-tools Developers <buildd-tools-devel@lists.alioth.debian.org>:
Bug#593516; Package schroot. (Wed, 18 Aug 2010 21:15:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Greg Price <price@ksplice.com>:
New Bug report received and forwarded. Copy sent to Debian buildd-tools Developers <buildd-tools-devel@lists.alioth.debian.org>. (Wed, 18 Aug 2010 21:15:04 GMT) Full text and rfc822 format available.

Message #5 received at submit@bugs.debian.org (full text, mbox):

From: Greg Price <price@ksplice.com>
To: submit@bugs.debian.org
Subject: schroot: race condition in /proc/mounts can break cleanup
Date: Wed, 18 Aug 2010 17:12:35 -0400
Package: schroot
Version: 1.3.2-1
Severity: normal

Hello,

schroot suffers from a race condition where if two schroot sessions
are ended concurrently, one of them may fail to clean up properly.  On
some intensely schroot-using build machines, I see this failure
regularly.  The symptom looks like this:

  E: 10mount: umount:
/var/lib/schroot/mount/somechroot-source-bd8cdb9f-611a-4991-869c-e479fa673ec7:
device is busy.

and then the 'schroot' command exits with failure.

The issue is that /proc/mounts is unreliable.  It turns out that when
you read it concurrently with a umount, you can end up missing records
for mounts that have nothing to do with the ones that were unmounted.
Arguably this is a kernel bug, but there is no simple fix, so
/proc/mounts will definitely remain unreliable at least for squeeze.

Consequently because schroot-listmounts relies on /proc/mounts, it too
is unreliable.  So when do_umount_all() in etc/setup.d/10mount does
what amounts to this (but with better reporting and error handling):

    "$LIBEXEC_DIR/schroot-listmounts" -m "$base" | xargs -rn1 umount

the schroot-listmounts may skip a record if another process is ending
another schroot session, or unmounting something for any other reason.
Then one of the internal mounts, like .../tmp or .../dev/pts, remains
mounted, and the attempt to unmount the base directory fails.

Currently I'm working around the issue with a flock around the body of
do_umount_all().  That's easy and is enough to address the problem
when no other umounts are happening on the system.  A real fix would
probably have to be for schroot to record for itself the list of
mounts it makes inside a session's tree, and rely on that list instead
of on /proc/mounts.

Cheers,
Greg




Information forwarded to debian-bugs-dist@lists.debian.org, Debian buildd-tools Developers <buildd-tools-devel@lists.alioth.debian.org>:
Bug#593516; Package schroot. (Thu, 19 Aug 2010 19:24:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Roger Leigh <rleigh@codelibre.net>:
Extra info received and forwarded to list. Copy sent to Debian buildd-tools Developers <buildd-tools-devel@lists.alioth.debian.org>. (Thu, 19 Aug 2010 19:24:05 GMT) Full text and rfc822 format available.

Message #10 received at 593516@bugs.debian.org (full text, mbox):

From: Roger Leigh <rleigh@codelibre.net>
To: Greg Price <price@ksplice.com>, 593516@bugs.debian.org
Subject: Re: [buildd-tools-devel] Bug#593516: schroot: race condition in /proc/mounts can break cleanup
Date: Thu, 19 Aug 2010 20:21:31 +0100
[Message part 1 (text/plain, inline)]
On Wed, Aug 18, 2010 at 05:12:35PM -0400, Greg Price wrote:
> schroot suffers from a race condition where if two schroot sessions
> are ended concurrently, one of them may fail to clean up properly.  On
> some intensely schroot-using build machines, I see this failure
> regularly.  The symptom looks like this:
> 
>   E: 10mount: umount:
> /var/lib/schroot/mount/somechroot-source-bd8cdb9f-611a-4991-869c-e479fa673ec7:
> device is busy.
> 
> and then the 'schroot' command exits with failure.
> 
> The issue is that /proc/mounts is unreliable.  It turns out that when
> you read it concurrently with a umount, you can end up missing records
> for mounts that have nothing to do with the ones that were unmounted.
> Arguably this is a kernel bug, but there is no simple fix, so
> /proc/mounts will definitely remain unreliable at least for squeeze.

This is rather annoying, but thanks for bringing it to my attention.

> Currently I'm working around the issue with a flock around the body of
> do_umount_all().  That's easy and is enough to address the problem
> when no other umounts are happening on the system.

If you would like this fix including, I'll be happy to accept a
patch implementing this.  Are you doing this in the shell script?
If it's possible to directly lock /proc/mounts with fcntl this could
be addressed directly in schroot-listmounts using sbuild::file_lock,
which wraps fcntl.

Has a bug report been filed against the Linux kernel or on the
kernel mailing list?  A fix in the kernel would be ideal, and they
should probably also be notified.

> A real fix would
> probably have to be for schroot to record for itself the list of
> mounts it makes inside a session's tree, and rely on that list instead
> of on /proc/mounts.

This wouldn't be robust enough, I'm afraid.  Processes inside the
chroot (or outside, for that matter) could mount (and umount)
filesystems after initial setup.  Mounts might also occur in
additional custom setup scripts.  As a result, we really do need
to get a definitive list of mounts at the session end in order to
clean up correctly.

At least on Linux, one other approach would be to make use of
per-process namespaces which would TTBOMK allow us to skip the
umounts completely since it's automatically cleaned up when the
last process in the namespace exits.  This has only recently
become viable without significant rearchitecting of schroot
though (IIRC it's now possible to connect to an existing namespace
without a parent-child relationship, though I'm not familiar with
the details).


Regards,
Roger

-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian buildd-tools Developers <buildd-tools-devel@lists.alioth.debian.org>:
Bug#593516; Package schroot. (Fri, 20 Aug 2010 00:57:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Greg Price <price@ksplice.com>:
Extra info received and forwarded to list. Copy sent to Debian buildd-tools Developers <buildd-tools-devel@lists.alioth.debian.org>. (Fri, 20 Aug 2010 00:57:06 GMT) Full text and rfc822 format available.

Message #15 received at 593516@bugs.debian.org (full text, mbox):

From: Greg Price <price@ksplice.com>
To: Roger Leigh <rleigh@codelibre.net>
Cc: 593516@bugs.debian.org
Subject: Re: [buildd-tools-devel] Bug#593516: schroot: race condition in /proc/mounts can break cleanup
Date: Thu, 19 Aug 2010 20:54:38 -0400
On Thu, Aug 19, 2010 at 3:21 PM, Roger Leigh <rleigh@codelibre.net> wrote:
>> Currently I'm working around the issue with a flock around the body of
>> do_umount_all().  That's easy and is enough to address the problem
>> when no other umounts are happening on the system.
>
> If you would like this fix including, I'll be happy to accept a
> patch implementing this.  Are you doing this in the shell script?
> If it's possible to directly lock /proc/mounts with fcntl this could
> be addressed directly in schroot-listmounts using sbuild::file_lock,
> which wraps fcntl.

Sure, patch copied below.  It's not possible for userspace to directly
lock /proc/mounts; the best we can do is an advisory lock that we
ourselves respect.  So I placed a flock around the invocations of both
schroot-listmounts and umount, which are the only such invocations in
the schroot source tree.


> Has a bug report been filed against the Linux kernel or on the
> kernel mailing list?  A fix in the kernel would be ideal, and they
> should probably also be notified.

I haven't filed such a bug.  I'm debating whether to send a message to
the LKML.  Honestly, I suspect this will follow many other well-loved
quirks in that it is difficult to fix, and the best solution to hope
for may be for more people to be aware of it and handle it with
strategies like the separate-namespace approach you suggest.  Maybe
I'll mail the LKML with a strawman patch, and see if someone can prove
me wrong. =)


>> A real fix would
>> probably have to be for schroot to record for itself the list of
>> mounts it makes inside a session's tree, and rely on that list instead
>> of on /proc/mounts.
>
> This wouldn't be robust enough, I'm afraid.  Processes inside the
> chroot (or outside, for that matter) could mount (and umount)
> filesystems after initial setup.  Mounts might also occur in
> additional custom setup scripts.  As a result, we really do need
> to get a definitive list of mounts at the session end in order to
> clean up correctly.

Oof.  Yes, I see.


> At least on Linux, one other approach would be to make use of
> per-process namespaces which would TTBOMK allow us to skip the
> umounts completely since it's automatically cleaned up when the
> last process in the namespace exits.  This has only recently
> become viable without significant rearchitecting of schroot
> though (IIRC it's now possible to connect to an existing namespace
> without a parent-child relationship, though I'm not familiar with
> the details).

That sounds like a potential solution.  If you could give a separate
mount namespace to each schroot session, that would definitely fix the
problem -- the list that /proc/mounts reads is per-namespace, so there
is no race between unmounting in one namespace and reading
/proc/mounts in another.

Greg



--- schroot-1.3.2.orig/etc/setup.d/10mount
+++ schroot-1.3.2/etc/setup.d/10mount
@@ -53,6 +53,7 @@
 do_umount_all()
 {
     if [ -d "$1" ]; then
+       ( flock 9
        mounts="$("$LIBEXEC_DIR/schroot-listmounts" -m "$1")"
        if [ "x$mounts" != 'x' ]; then
             echo "$mounts" |
@@ -63,6 +64,7 @@
                umount "$mountloc" || exit 1
             done || exit 1
        fi
+       ) 9>/var/lock/umount
     fi
 }




Bug 593516 cloned as bug 593883. Request was from Roger Leigh <rleigh@codelibre.net> to control@bugs.debian.org. (Sat, 21 Aug 2010 20:51:06 GMT) Full text and rfc822 format available.

Bug reassigned from package 'schroot' to 'linux-2.6'. Request was from Roger Leigh <rleigh@codelibre.net> to control@bugs.debian.org. (Sat, 21 Aug 2010 20:51:08 GMT) Full text and rfc822 format available.

Bug No longer marked as found in versions schroot/1.3.2-1. Request was from Roger Leigh <rleigh@codelibre.net> to control@bugs.debian.org. (Sat, 21 Aug 2010 20:51:09 GMT) Full text and rfc822 format available.

Changed Bug title to 'linux-2.6: race reading /proc/mounts causes umount failure and potential for serious dataloss' from 'schroot: race condition in /proc/mounts can break cleanup' Request was from Roger Leigh <rleigh@codelibre.net> to control@bugs.debian.org. (Sat, 21 Aug 2010 20:51:10 GMT) Full text and rfc822 format available.

Severity set to 'serious' from 'normal' Request was from Roger Leigh <rleigh@codelibre.net> to control@bugs.debian.org. (Sat, 21 Aug 2010 20:51:11 GMT) Full text and rfc822 format available.

Reply sent to Ben Hutchings <ben@decadent.org.uk>:
You have taken responsibility. (Sun, 03 Oct 2010 23:18:07 GMT) Full text and rfc822 format available.

Notification sent to Greg Price <price@ksplice.com>:
Bug acknowledged by developer. (Sun, 03 Oct 2010 23:18:08 GMT) Full text and rfc822 format available.

Message #30 received at 593883-done@bugs.debian.org (full text, mbox):

From: Ben Hutchings <ben@decadent.org.uk>
To: 593883-done@bugs.debian.org
Subject: Re: race reading /proc/mounts causes umount failure and potential for serious dataloss
Date: Mon, 04 Oct 2010 00:14:58 +0100
[Message part 1 (text/plain, inline)]
-------- Forwarded Message --------
From: Ben Hutchings <ben@decadent.org.uk>
To: 593516-done@bugs.debian.org
Subject: Re: race reading /proc/mounts causes umount failure and potential for serious dataloss
Date: Sun, 22 Aug 2010 02:28:43 +0100

On Sat, 2010-08-21 at 21:46 +0100, Roger Leigh wrote:
[...]
> I'm not aware of the specifics of how /proc/mounts is implemented, but
> I think this could be made reliable if the contents were not changed
> once opened (i.e. each reader gets a unique immutable copy rather than
> a mutable global copy that's subject to spurious changes).
[...]

And it would be a lot more convenient if directories worked like that
too.  Neither of them do, because it would require arbitrarily large
memory allocations in the kernel.

If you can't read the whole file in one go, allocate a larger buffer and
retry.  Repeat until successful.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[signature.asc (application/pgp-signature, inline)]

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Mon, 01 Nov 2010 07:35:23 GMT) Full text and rfc822 format available.

Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Sun Apr 20 19:24:47 2014; Machine Name: beach.debian.org

Debian Bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.