Debian Bug report logs - #739593
systemd makes / shared by default

version graph

Package: systemd; Maintainer for systemd is Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>; Source for systemd is src:systemd (PTS, buildd, popcon).

Reported by: Christoph Berg <christoph.berg@credativ.de>

Date: Thu, 20 Feb 2014 09:42:02 UTC

Severity: important

Found in version systemd/204-6

Done: Michael Stapelberg <stapelberg@debian.org>

Bug is archived. No further changes may be made.

Toggle useless messages

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to debian-bugs-dist@lists.debian.org, Debian Kernel Team <debian-kernel@lists.debian.org>:
Bug#739593; Package src:linux. (Thu, 20 Feb 2014 09:42:06 GMT) (full text, mbox, link).


Message #3 received at submit@bugs.debian.org (full text, mbox, reply):

From: Christoph Berg <christoph.berg@credativ.de>
To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: unshare -m aka unshare(CLONE_NEWNS) mounts are visible globally
Date: Thu, 20 Feb 2014 10:39:55 +0100
Source: linux
Version: 3.12.9-1
Severity: important

Mounts done in a unshare(CLONE_NEWNS) or unshare -m environment are
globally visible, and are not automatically removed once the process
exits:

$ mount | grep foobar
$ sudo unshare -m -- mount -t tmpfs foobar /tmp
$ mount | grep foobar
foobar on /tmp type tmpfs (rw,relatime)

This system is running systemd 204-6.

-- System Information:
Debian Release: jessie/sid
  APT prefers testing
  APT policy: (700, 'testing'), (150, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.12-1-amd64 (SMP w/8 CPU cores)
Locale: LANG=de_DE.utf8, LC_CTYPE=de_DE.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/



Bug reassigned from package 'src:linux' to 'systemd'. Request was from Christoph Berg <christoph.berg@credativ.de> to control@bugs.debian.org. (Thu, 20 Feb 2014 10:15:09 GMT) (full text, mbox, link).


No longer marked as found in versions linux/3.12.9-1. Request was from Christoph Berg <christoph.berg@credativ.de> to control@bugs.debian.org. (Thu, 20 Feb 2014 10:15:10 GMT) (full text, mbox, link).


Severity set to 'grave' from 'important' Request was from Christoph Berg <christoph.berg@credativ.de> to control@bugs.debian.org. (Thu, 20 Feb 2014 10:15:11 GMT) (full text, mbox, link).


Marked as found in versions systemd/204-6. Request was from Christoph Berg <christoph.berg@credativ.de> to control@bugs.debian.org. (Thu, 20 Feb 2014 10:15:15 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Thu, 20 Feb 2014 13:36:05 GMT) (full text, mbox, link).


Acknowledgement sent to Bastian Blank <bastian.blank@credativ.de>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Thu, 20 Feb 2014 13:36:05 GMT) (full text, mbox, link).


Message #16 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Bastian Blank <bastian.blank@credativ.de>
To: 739593@bugs.debian.org
Subject: systemd makes / shared by default
Date: Thu, 20 Feb 2014 14:33:54 +0100
system remounts / as shared with the following comment:

| Mark the root directory as shared in regards to mount
| propagation. The kernel defaults to "private", but we think
| it makes more sense to have a default of "shared" so that
| nspawn and the container tools work out of the box. If
| specific setups need other settings they can reset the
| propagation mode to private if needed.

Bastian

-- 
Bastian Blank
Berater                                   Durchwahl: +49 2161 / 4643-194
credativ GmbH, HRB Mönchengladbach 12080  Zentrale: +49 2161 / 4643-0
Hohenzollernstr. 133                      Fax: +49 2161 / 4643-100
D-41061 Mönchengladbach                   www: http://www.credativ.de
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



Reply sent to Michael Stapelberg <stapelberg@debian.org>:
You have taken responsibility. (Sat, 22 Feb 2014 10:54:05 GMT) (full text, mbox, link).


Notification sent to Christoph Berg <christoph.berg@credativ.de>:
Bug acknowledged by developer. (Sat, 22 Feb 2014 10:54:05 GMT) (full text, mbox, link).


Message #21 received at 739593-done@bugs.debian.org (full text, mbox, reply):

From: Michael Stapelberg <stapelberg@debian.org>
To: Bastian Blank <bastian.blank@credativ.de>, 739593-done@bugs.debian.org
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: systemd makes / shared by default
Date: Sat, 22 Feb 2014 11:50:02 +0100
Hi,

Bastian Blank <bastian.blank@credativ.de> writes:
> system remounts / as shared with the following comment:
>
> | Mark the root directory as shared in regards to mount
> | propagation. The kernel defaults to "private", but we think
> | it makes more sense to have a default of "shared" so that
> | nspawn and the container tools work out of the box. If
> | specific setups need other settings they can reset the
> | propagation mode to private if needed.
As Bastian notes, this is intended behavior. Closing the bug therefore.

-- 
Best regards,
Michael



Message #22 received at 739593-done@bugs.debian.org (full text, mbox, reply):

From: Bastian Blank <bastian.blank@credativ.de>
To: Michael Stapelberg <stapelberg@debian.org>
Cc: 739593-done@bugs.debian.org
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: systemd makes / shared by default
Date: Mon, 24 Feb 2014 09:18:59 +0100
Control: reopen 739593

On Sat, Feb 22, 2014 at 11:50:02AM +0100, Michael Stapelberg wrote:
> Bastian Blank <bastian.blank@credativ.de> writes:
> > system remounts / as shared with the following comment:
> >
> > | Mark the root directory as shared in regards to mount
> > | propagation. The kernel defaults to "private", but we think
> > | it makes more sense to have a default of "shared" so that
> > | nspawn and the container tools work out of the box. If
> > | specific setups need other settings they can reset the
> > | propagation mode to private if needed.
> As Bastian notes, this is intended behavior. Closing the bug therefore.

Please speak to us kernel maintainers if you think the default behaviour
is wrong. Re-opening as this is global state.

Bastian

-- 
Bastian Blank
Berater                                   Durchwahl: +49 2161 / 4643-194
credativ GmbH, HRB Mönchengladbach 12080  Zentrale: +49 2161 / 4643-0
Hohenzollernstr. 133                      Fax: +49 2161 / 4643-100
D-41061 Mönchengladbach                   www: http://www.credativ.de
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer



Bug reopened Request was from Bastian Blank <bastian.blank@credativ.de> to control@bugs.debian.org. (Mon, 24 Feb 2014 11:04:33 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Mon, 24 Feb 2014 11:51:04 GMT) (full text, mbox, link).


Message #27 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Christoph Berg <christoph.berg@credativ.de>
To: 739593@bugs.debian.org
Subject: Re: Bug#739593 closed by Michael Stapelberg <stapelberg@debian.org> (Re: [Pkg-systemd-maintainers] Bug#739593: systemd makes / shared by default)
Date: Mon, 24 Feb 2014 12:49:04 +0100
[Message part 1 (text/plain, inline)]
> > system remounts / as shared with the following comment:
> >
> > | Mark the root directory as shared in regards to mount
> > | propagation. The kernel defaults to "private", but we think
> > | it makes more sense to have a default of "shared" so that
> > | nspawn and the container tools work out of the box. If
> > | specific setups need other settings they can reset the
> > | propagation mode to private if needed.
> As Bastian notes, this is intended behavior. Closing the bug therefore.

To put in more context here: the problem is with the postgresql
testsuite in /usr/share/postgresql-common/{testsuite,t} .

It puts tmpfs mounts over /etc/postgresql /etc/postgresql-common
/var/lib/postgresql /var/log/postgresql in an CLONE_NEWNS environment.
Once the tests are done, the tmpfs mounts go out of scope and no trash
from the tests is left behind on the host system. These tests are also
run from autopkgtest.

I don't think we as PostgreSQL maintainers should be messing with the
configuration of / to enable the testsuite.

Please consider reverting to the kernel default.

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Mon, 24 Feb 2014 12:30:05 GMT) (full text, mbox, link).


Acknowledgement sent to Sam Morris <sam@robots.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Mon, 24 Feb 2014 12:30:05 GMT) (full text, mbox, link).


Message #32 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Sam Morris <sam@robots.org.uk>
To: Christoph Berg <christoph.berg@credativ.de>, 739593@bugs.debian.org
Subject: Re: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: [Pkg-systemd-maintainers] Bug#739593: systemd makes / shared by default)
Date: Mon, 24 Feb 2014 12:27:51 +0000
On Mon, Feb 24, 2014 at 12:49:04PM +0100, Christoph Berg wrote:
> To put in more context here: the problem is with the postgresql
> testsuite in /usr/share/postgresql-common/{testsuite,t} .
> 
> It puts tmpfs mounts over /etc/postgresql /etc/postgresql-common
> /var/lib/postgresql /var/log/postgresql in an CLONE_NEWNS environment.
> Once the tests are done, the tmpfs mounts go out of scope and no trash
> from the tests is left behind on the host system. These tests are also
> run from autopkgtest.
> 
> I don't think we as PostgreSQL maintainers should be messing with the
> configuration of / to enable the testsuite.

The tests will still break if the admin has set shared propagation on /.
In my own code that uses CLONE_NEWNS for the same reason, I explicitly
run 'mount --make-rprivate /' to bring the propagation settings into a
known desired state; I suggest that others do the same.

-- 
Sam Morris <https://robots.org.uk/>
3412 EA18 1277 354B 991B  C869 B219 7FDB 5EA0 1078



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Mon, 24 Feb 2014 14:45:07 GMT) (full text, mbox, link).


Message #35 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Christoph Berg <christoph.berg@credativ.de>
To: Sam Morris <sam@robots.org.uk>
Cc: 739593@bugs.debian.org
Subject: Re: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: [Pkg-systemd-maintainers] Bug#739593: systemd makes / shared by default)
Date: Mon, 24 Feb 2014 15:43:07 +0100
[Message part 1 (text/plain, inline)]
Control: severity -1 important
Control: retitle -1 systemd makes / shared by default

Re: Sam Morris 2014-02-24 <20140224122751.GA7788@traxus.robots.org.uk>
> > I don't think we as PostgreSQL maintainers should be messing with the
> > configuration of / to enable the testsuite.
> 
> The tests will still break if the admin has set shared propagation on /.
> In my own code that uses CLONE_NEWNS for the same reason, I explicitly
> run 'mount --make-rprivate /' to bring the propagation settings into a
> known desired state; I suggest that others do the same.

The bit I was missing here is that I can run "mount --make-rprivate /"
*inside* the CLONE_NEWNS namespace, so that it doesn't modify the
system's global state, but just what I am seeing. (Does anyone
actually understand these semantics?!)

We can put that into our unshare -m scripts, so I guess the problem is
solved for us, but still, the question remains if systemd should
override the kernel default here. (Hence downgrading the bug.)

Mit freundlichen Grüßen,
Christoph Berg
-- 
Senior Berater, Tel.: +49 (0)21 61 / 46 43-187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Hohenzollernstr. 133, 41061 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE
[signature.asc (application/pgp-signature, inline)]

Severity set to 'important' from 'grave' Request was from Christoph Berg <christoph.berg@credativ.de> to 739593-submit@bugs.debian.org. (Mon, 24 Feb 2014 14:45:07 GMT) (full text, mbox, link).


Changed Bug title to 'systemd makes / shared by default' from 'unshare -m aka unshare(CLONE_NEWNS) mounts are visible globally' Request was from Christoph Berg <christoph.berg@credativ.de> to 739593-submit@bugs.debian.org. (Mon, 24 Feb 2014 14:45:09 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Mon, 24 Feb 2014 16:45:05 GMT) (full text, mbox, link).


Acknowledgement sent to Sam Morris <sam@robots.org.uk>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Mon, 24 Feb 2014 16:45:05 GMT) (full text, mbox, link).


Message #44 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Sam Morris <sam@robots.org.uk>
To: Christoph Berg <christoph.berg@credativ.de>, 739593@bugs.debian.org
Subject: Re: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: [Pkg-systemd-maintainers] Bug#739593: systemd makes / shared by default)
Date: Mon, 24 Feb 2014 16:43:59 +0000
On Mon, Feb 24, 2014 at 03:43:07PM +0100, Christoph Berg wrote:
> Control: severity -1 important
> Control: retitle -1 systemd makes / shared by default
> 
> Re: Sam Morris 2014-02-24 <20140224122751.GA7788@traxus.robots.org.uk>
> > > I don't think we as PostgreSQL maintainers should be messing with the
> > > configuration of / to enable the testsuite.
> > 
> > The tests will still break if the admin has set shared propagation on /.
> > In my own code that uses CLONE_NEWNS for the same reason, I explicitly
> > run 'mount --make-rprivate /' to bring the propagation settings into a
> > known desired state; I suggest that others do the same.
> 
> The bit I was missing here is that I can run "mount --make-rprivate /"
> *inside* the CLONE_NEWNS namespace, so that it doesn't modify the
> system's global state, but just what I am seeing. (Does anyone
> actually understand these semantics?!)

I think I had to read sharedsubtree.txt about a dozen times before I
understood it, so you're not the only one left wanting better
documentation. :)


-- 
Sam Morris <https://robots.org.uk/>
3412 EA18 1277 354B 991B  C869 B219 7FDB 5EA0 1078



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Thu, 27 Feb 2014 09:39:05 GMT) (full text, mbox, link).


Acknowledgement sent to Michael Stapelberg <stapelberg@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Thu, 27 Feb 2014 09:39:05 GMT) (full text, mbox, link).


Message #49 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Michael Stapelberg <stapelberg@debian.org>
To: Sam Morris <sam@robots.org.uk>, 739593@bugs.debian.org, Christoph Berg <christoph.berg@credativ.de>, "Lennart Poettering" <lennart@poettering.net>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Thu, 27 Feb 2014 10:36:52 +0100
[+lennart]

Hi,

Sam Morris <sam@robots.org.uk> writes:

> On Mon, Feb 24, 2014 at 03:43:07PM +0100, Christoph Berg wrote:
>> Control: severity -1 important
>> Control: retitle -1 systemd makes / shared by default
>> 
>> Re: Sam Morris 2014-02-24 <20140224122751.GA7788@traxus.robots.org.uk>
>> > > I don't think we as PostgreSQL maintainers should be messing with the
>> > > configuration of / to enable the testsuite.
>> > 
>> > The tests will still break if the admin has set shared propagation on /.
>> > In my own code that uses CLONE_NEWNS for the same reason, I explicitly
>> > run 'mount --make-rprivate /' to bring the propagation settings into a
>> > known desired state; I suggest that others do the same.
>> 
>> The bit I was missing here is that I can run "mount --make-rprivate /"
>> *inside* the CLONE_NEWNS namespace, so that it doesn't modify the
>> system's global state, but just what I am seeing. (Does anyone
>> actually understand these semantics?!)
>
> I think I had to read sharedsubtree.txt about a dozen times before I
> understood it, so you're not the only one left wanting better
> documentation. :)
Lennart, we are considering disabling the code in systemd which makes /
shared by default so that we follow the kernel default.

I’d be interested in your comments on that, especially in the context of
this bugreport (see http://bugs.debian.org/739593 for full history).

-- 
Best regards,
Michael



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Thu, 27 Feb 2014 19:03:09 GMT) (full text, mbox, link).


Acknowledgement sent to Lennart Poettering <lennart@poettering.net>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Thu, 27 Feb 2014 19:03:09 GMT) (full text, mbox, link).


Message #54 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Lennart Poettering <lennart@poettering.net>
To: Michael Stapelberg <stapelberg@debian.org>
Cc: Sam Morris <sam@robots.org.uk>, 739593@bugs.debian.org, Christoph Berg <christoph.berg@credativ.de>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Thu, 27 Feb 2014 19:50:39 +0100
On Thu, 27.02.14 10:36, Michael Stapelberg (stapelberg@debian.org) wrote:

> >> The bit I was missing here is that I can run "mount --make-rprivate /"
> >> *inside* the CLONE_NEWNS namespace, so that it doesn't modify the
> >> system's global state, but just what I am seeing. (Does anyone
> >> actually understand these semantics?!)
> >
> > I think I had to read sharedsubtree.txt about a dozen times before I
> > understood it, so you're not the only one left wanting better
> > documentation. :)
>
> Lennart, we are considering disabling the code in systemd which makes /
> shared by default so that we follow the kernel default.

Hmm? Why would you do that?

> I’d be interested in your comments on that, especially in the context of
> this bugreport (see http://bugs.debian.org/739593 for full history).

If you open your own mount namespace and don't want propagation, then
turn off propagation, by remounting the root dir inside the namespace
with MS_REC|MS_SLAVE or suchlike.

We turned the default from PRIVATE to SHARED on request of the container
and security guys, since they want that if you mount something from the
host into a subdir of the container, it should just appear there,
because that's what most people would most likely expect. Or, if you use
something like pam_namespace to give users a private /tmp, they should
otherwise see all the mounts popping up/removed as normal.

The kernel default for this is unlikely to change since they argue that
it breaks compatbility, which I kinda agree with. In systemd however, we
thought we'd better pick saner defaults.

I'd strongly recommend not to patch this in Debian. First of all, you'd
break a lot of stuff when using containers, where suddenly mounts on the
host wouldn't propagate anymore to containers, or where using
pam_namespace for /tmp could not work anymore, which would certainly be
confusing. But more importantly you don't actually "fix" anything. You
just switch defaults, and with the new default your specific case might
start working, but for everybody else who changed the default things
would still be broken. And since disassocitation is a one-way street if
you globally disassociate everything you can never reassociate things...

Or to explain this differently:

a) With the default of MS_SHARED for the root dir like systemd sets it up,
   you enable propagation to containers, and those who don't want the
   propagation can opt-out of it for their specific namespace.

   Advantage: you cover all usecases with the default setting. All
   programs will work with both of MS_SHARED and MS_PRIVATE set for /.

   Disadvantage: you might need to patch a package or two to properly
   disassociate their namespace from the host by remounting the root dir
   inside of the namespace with MS_REC|MS_SLAVE as described above.

b) If you patch systemd to go back to MS_PRIVATE for the root dir, you
   disable propagation to containers, and nobody can opt-in to it anymore
   for their specific namespace. 

   Advantage: you don't have to patch those few programs which
   currently assume the root dir is MS_PRIVATE and don't disassociate
   things.

   Disadvantage: the apps are still broken for those who switch to
   MS_SHARED for /. You hence only cover the usecases where people do
   not dissassocitate. You break the usecase where people want the
   propagation to tkae place.

TL;DR: fix the individual processes locally to disassociate their
namespaces. Don't tape over it by making all of them disassociate by
default, breaking those which do not want to disassociate. Because after
disassociation there is no way back.

Oh, and of course, in Fedora and RHEL we'll stick to the MS_SHARED
defaults. Sooner or later we'll patch through all software that assumes
that MS_PRIVATE was the default... Hence, sooner or later we'll fix all
these things for you anyway...

Hope this makes some sense...

Lennart

-- 
Lennart Poettering, Red Hat



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Fri, 28 Feb 2014 05:54:04 GMT) (full text, mbox, link).


Acknowledgement sent to Martin Pitt <mpitt@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Fri, 28 Feb 2014 05:54:04 GMT) (full text, mbox, link).


Message #59 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Martin Pitt <mpitt@debian.org>
To: Lennart Poettering <lennart@poettering.net>, 739593@bugs.debian.org
Cc: Michael Stapelberg <stapelberg@debian.org>, Christoph Berg <christoph.berg@credativ.de>, Sam Morris <sam@robots.org.uk>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Fri, 28 Feb 2014 06:51:35 +0100
Hey all,

Lennart Poettering [2014-02-27 19:50 +0100]:
> > Lennart, we are considering disabling the code in systemd which makes /
> > shared by default so that we follow the kernel default.
> 
> Hmm? Why would you do that?

TBH I found it rather unexpected from systemd to suddenly change that
kernel default, as it has worked the other way for years. Now we are
between a rock and a hard place: Installing systemd breaks existing
stuff that relies on unshared name spaces being private, and patching
it back out breaks applications which rely on the new systemd
behaviour.

> We turned the default from PRIVATE to SHARED on request of the container
> and security guys, since they want that if you mount something from the
> host into a subdir of the container, it should just appear there,
> because that's what most people would most likely expect.

Well, but conversely what scripts/people expected before that script
was that something that you run under "unshare -m" really actually did
what it says on the tin, namely that it really *does* have its private
mount name space. Now it doesn't, and mounts done in that unshared
process affect the system outside of it. I. e. all such programs now
have to be changed to do that "mount --make-rprivate /" dance.

> The kernel default for this is unlikely to change since they argue that
> it breaks compatbility, which I kinda agree with. In systemd however, we
> thought we'd better pick saner defaults.

That has the same net effect though, changing the global default?
systemd and the kernel shouldn't have two different defaults,
otherwise we'll eternally have scripts and programs with different
expectations.

> TL;DR: fix the individual processes locally to disassociate their
> namespaces. Don't tape over it by making all of them disassociate by
> default, breaking those which do not want to disassociate. Because after
> disassociation there is no way back.

I agree that due to this symmetric behaviour of unsharing (which is
really counterintuitive and broken at first sight, but I guess it's
technically difficult to implement in a proper host/guest fashion) we
really shouldn't patch that back in Debian, and just live with the
fallout (and find and fix it over time), as there is no way back as
you explained.

Perhaps as a mitigation /usr/bin/unshare could be fixed to imply
making the unshared namespace private, so that this behaviour
continues as it does before. And of course the kernel should then also
default to the new behaviour, otherwise we have an eternal
inconsistency there and a default which nobody actually uses.

Thanks for your explanations!

Martin

-- 
Martin Pitt                        | http://www.piware.de
Ubuntu Developer (www.ubuntu.com)  | Debian Developer  (www.debian.org)



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Fri, 28 Feb 2014 13:15:08 GMT) (full text, mbox, link).


Acknowledgement sent to Lennart Poettering <lennart@poettering.net>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Fri, 28 Feb 2014 13:15:08 GMT) (full text, mbox, link).


Message #64 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Lennart Poettering <lennart@poettering.net>
To: Martin Pitt <mpitt@debian.org>
Cc: 739593@bugs.debian.org, Michael Stapelberg <stapelberg@debian.org>, Christoph Berg <christoph.berg@credativ.de>, Sam Morris <sam@robots.org.uk>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Fri, 28 Feb 2014 14:13:25 +0100
On Fri, 28.02.14 06:51, Martin Pitt (mpitt@debian.org) wrote:

> > We turned the default from PRIVATE to SHARED on request of the container
> > and security guys, since they want that if you mount something from the
> > host into a subdir of the container, it should just appear there,
> > because that's what most people would most likely expect.
> 
> Well, but conversely what scripts/people expected before that script
> was that something that you run under "unshare -m" really actually did
> what it says on the tin, namely that it really *does* have its private
> mount name space. Now it doesn't, and mounts done in that unshared
> process affect the system outside of it. I. e. all such programs now
> have to be changed to do that "mount --make-rprivate /" dance.

I have talked to Karel, he's thinking about adding
--propagation=slave|shared|private|none to unshare -m now, with a
default of "slave". Please ping him on IRC or so, so that he sees that
there is demand for that. With that change "unshare -m" should work for
everybody the same.

> > The kernel default for this is unlikely to change since they argue that
> > it breaks compatbility, which I kinda agree with. In systemd however, we
> > thought we'd better pick saner defaults.
> 
> That has the same net effect though, changing the global default?
> systemd and the kernel shouldn't have two different defaults,
> otherwise we'll eternally have scripts and programs with different
> expectations.

Well, we don't provide 100% compat anyway, just 99%. We are pretty sure
that the "shared" default makes a lot of sense though and that apps
that need their private setups need to be fixed anyway, so we took the
liberty to switch here, better earlier than later. Of course, that
worked for us quite well, since we already did this change 3y ago where
people probably didn't assume things about "unshare -m "so much yet...

Lennart

-- 
Lennart Poettering, Red Hat



Reply sent to Michael Stapelberg <stapelberg@debian.org>:
You have taken responsibility. (Fri, 28 Feb 2014 16:57:20 GMT) (full text, mbox, link).


Notification sent to Christoph Berg <christoph.berg@credativ.de>:
Bug acknowledged by developer. (Fri, 28 Feb 2014 16:57:20 GMT) (full text, mbox, link).


Message #69 received at 739593-done@bugs.debian.org (full text, mbox, reply):

From: Michael Stapelberg <stapelberg@debian.org>
To: Lennart Poettering <lennart@poettering.net>, Martin Pitt <mpitt@debian.org>
Cc: 739593-done@bugs.debian.org, Christoph Berg <christoph.berg@credativ.de>, Sam Morris <sam@robots.org.uk>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Fri, 28 Feb 2014 17:56:09 +0100
Hi Lennart,

Thanks for your comments, what you are describing makes sense to me. I
don’t think there is anything left to do for the Debian systemd
maintainers here, so I’ll close the bug.

For the people affected by this, please open separate bugs against the
affected packages to get the software fixed which makes assumptions
about shared/private-ness that doesn’t necessarily hold. Thanks.

-- 
Best regards,
Michael



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Tue, 04 Mar 2014 22:27:13 GMT) (full text, mbox, link).


Message #72 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Christoph Berg <myon@debian.org>
To: Michael Stapelberg <stapelberg@debian.org>
Cc: Lennart Poettering <lennart@poettering.net>, Martin Pitt <mpitt@debian.org>, 739593@bugs.debian.org, Sam Morris <sam@robots.org.uk>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Tue, 4 Mar 2014 23:24:34 +0100
[Message part 1 (text/plain, inline)]
Re: Michael Stapelberg 2014-02-28 <x6iorznpli.fsf@midna.zekjur.net>
> For the people affected by this, please open separate bugs against the
> affected packages to get the software fixed which makes assumptions
> about shared/private-ness that doesn’t necessarily hold. Thanks.

Pardon this question, but how do I do this "mount --make-rprivate /"
in C? I've tried stracing mount:

mount("/dev/mapper/Debian-root", "/", "none", MS_REC|MS_PRIVATE, "errors=remount-ro,discard") = 0

... but every combination of MS_REMOUNT MS_REC MS_PRIVATE I've tried
on / and /proc leaves me with a /proc in the main system that simply
doesn't have any pid dirs.

The program in question is newpid:
https://github.com/ChristophBerg/newpid/blob/master/newpid.c

... with an extra chunk before mount(proc):

        if (mount ("", "/", "", MS_REC|MS_PRIVATE, NULL) != 0) {
                perror ("mount /");
                exit (1);
        }

Am I supposed to parse the current mount flags etc from /proc/mounts
before doing that?

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Sat, 08 Mar 2014 21:09:04 GMT) (full text, mbox, link).


Acknowledgement sent to Michael Stapelberg <stapelberg@debian.org>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Sat, 08 Mar 2014 21:09:05 GMT) (full text, mbox, link).


Message #77 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Michael Stapelberg <stapelberg@debian.org>
To: Christoph Berg <myon@debian.org>
Cc: Lennart Poettering <lennart@poettering.net>, Martin Pitt <mpitt@debian.org>, 739593@bugs.debian.org, Sam Morris <sam@robots.org.uk>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Sat, 08 Mar 2014 22:04:45 +0100
Hi Christoph,

Christoph Berg <myon@debian.org> writes:
> Pardon this question, but how do I do this "mount --make-rprivate /"
> in C? I've tried stracing mount:
The following patch works for me to make your newpid program work:

--- i/newpid.c
+++ w/newpid.c
@@ -40,11 +40,9 @@ run (void *argv_void)
        pid_t child;
        pid_t pid;
 
-       if (umount ("/proc") != 0) {
-               /* ignore errors here, /proc could be busy
-               perror ("umount /proc");
+       if (mount("none", "/proc", NULL, MS_PRIVATE|MS_REC, NULL) != 0)
-               {
+               perror ("remount proc private");
                exit (1);
-               */
        }
 
        if (mount ("proc", "/proc", "proc", 0, NULL) != 0) {

I took this from
https://git.kernel.org/cgit/utils/util-linux/util-linux.git/tree/sys-utils/unshare.c#n182

-- 
Best regards,
Michael



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Tue, 11 Mar 2014 13:18:04 GMT) (full text, mbox, link).


Message #80 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Christoph Berg <myon@debian.org>
To: Michael Stapelberg <stapelberg@debian.org>
Cc: Lennart Poettering <lennart@poettering.net>, Martin Pitt <mpitt@debian.org>, 739593@bugs.debian.org, Sam Morris <sam@robots.org.uk>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Tue, 11 Mar 2014 14:14:31 +0100
[Message part 1 (text/plain, inline)]
Re: Michael Stapelberg 2014-03-08 <x6d2hwflle.fsf@midna.zekjur.net>
> The following patch works for me to make your newpid program work:
> 
> --- i/newpid.c
> +++ w/newpid.c
> @@ -40,11 +40,9 @@ run (void *argv_void)
>         pid_t child;
>         pid_t pid;
>  
> -       if (umount ("/proc") != 0) {
> -               /* ignore errors here, /proc could be busy
> -               perror ("umount /proc");
> +       if (mount("none", "/proc", NULL, MS_PRIVATE|MS_REC, NULL) != 0)

Ok, that works, thanks! I only tried to remount / which didn't seem to
have any effect.

Unfortunately MS_PRIVATE and MS_REC are not defined in squeeze, so
that fix won't work for chroots running on a systemd system, but
that's something I should be able to work around.

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/
[signature.asc (application/pgp-signature, inline)]

Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Tue, 11 Mar 2014 13:45:08 GMT) (full text, mbox, link).


Acknowledgement sent to Lennart Poettering <lennart@poettering.net>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Tue, 11 Mar 2014 13:45:08 GMT) (full text, mbox, link).


Message #85 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Lennart Poettering <lennart@poettering.net>
To: Christoph Berg <myon@debian.org>, Michael Stapelberg <stapelberg@debian.org>, Martin Pitt <mpitt@debian.org>, 739593@bugs.debian.org, Sam Morris <sam@robots.org.uk>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Tue, 11 Mar 2014 14:41:18 +0100
On Tue, 11.03.14 14:14, Christoph Berg (myon@debian.org) wrote:

> Re: Michael Stapelberg 2014-03-08 <x6d2hwflle.fsf@midna.zekjur.net>
> > The following patch works for me to make your newpid program work:
> > 
> > --- i/newpid.c
> > +++ w/newpid.c
> > @@ -40,11 +40,9 @@ run (void *argv_void)
> >         pid_t child;
> >         pid_t pid;
> >  
> > -       if (umount ("/proc") != 0) {
> > -               /* ignore errors here, /proc could be busy
> > -               perror ("umount /proc");
> > +       if (mount("none", "/proc", NULL, MS_PRIVATE|MS_REC, NULL) !=
> > -               0)

Please do not use MS_PRIVATE for this. This has the the result to
disconnect propagation both ways, which doesn't sound too bad, but
actually is. The reason is that this controls propagation for both mount
*and* umount. Hence any file system you inherited from the root
namespace will stay mounted forever in your detached namespace, and that
might be a problem for the admin since that way the device it is mounted
from is kept busy until forever. If you use MS_SLAVE however then any
umount from the host will still propagate into your namespace, and thus
no t keep things busy. Now, if you only care about /proc then this isn't
too bad as no block devices are mounted below /proc, but I would still
do it, since at least binfmt_misc is still mounted there...

MS_PRIVATE only makes sense on file systems you created entirely on your
own.

> Ok, that works, thanks! I only tried to remount / which didn't seem to
> have any effect.
> 
> Unfortunately MS_PRIVATE and MS_REC are not defined in squeeze, so
> that fix won't work for chroots running on a systemd system, but
> that's something I should be able to work around.

They have been vailable in the kernel for a long long time. If you libc
doesn't expose them use something like this:

#ifndef MS_PRIVATE
#define MS_PRIVATE  (1 << 18)
#endif

#ifndef MS_REC
#define MS_REC 16384
#endif

We use the same code in systemd.

Lennart

-- 
Lennart Poettering, Red Hat



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Wed, 12 Mar 2014 15:03:04 GMT) (full text, mbox, link).


Message #88 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Christoph Berg <myon@debian.org>
To: Lennart Poettering <lennart@poettering.net>
Cc: Michael Stapelberg <stapelberg@debian.org>, Martin Pitt <mpitt@debian.org>, 739593@bugs.debian.org, Sam Morris <sam@robots.org.uk>
Subject: Re: [Pkg-systemd-maintainers] Bug#739593: Bug#739593: closed by Michael Stapelberg <stapelberg@debian.org> (Re: Bug#739593: systemd makes / shared by default)
Date: Wed, 12 Mar 2014 15:58:29 +0100
[Message part 1 (text/plain, inline)]
Re: Lennart Poettering 2014-03-11 <20140311134118.GC4354@tango.0pointer.de>
> Please do not use MS_PRIVATE for this. This has the the result to

Ok, thanks for the suggestion.

> They have been vailable in the kernel for a long long time. If you libc
> doesn't expose them use something like this:
> 
> #ifndef MS_PRIVATE
> #define MS_PRIVATE  (1 << 18)
> #endif

Yeah, I've done that now (with MS_SLAVE).

Christoph
-- 
cb@df7cb.de | http://www.df7cb.de/
[signature.asc (application/pgp-signature, inline)]

Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Thu, 10 Apr 2014 07:32:05 GMT) (full text, mbox, link).


Bug unarchived. Request was from Tomas Pospisek <tpo@sourcepole.ch> to control@bugs.debian.org. (Fri, 13 Feb 2015 14:09:18 GMT) (full text, mbox, link).


Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Fri, 13 Feb 2015 14:18:04 GMT) (full text, mbox, link).


Acknowledgement sent to Tomas Pospisek <tpo@sourcepole.ch>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Fri, 13 Feb 2015 14:18:04 GMT) (full text, mbox, link).


Message #97 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Tomas Pospisek <tpo@sourcepole.ch>
To: 739593@bugs.debian.org
Subject: Re: systemd makes / shared by default (fwd)
Date: Fri, 13 Feb 2015 15:14:17 +0100 (CET)
(resubmitting/forwarding since on first attempt this bug was already
 archived and this my comment wouldn't get included therein)

---------- Forwarded message ----------
Date: Sun, 8 Feb 2015 18:19:06 +0100 (CET)
From: Tomas Pospisek
To: 739593@b.d.o
Cc: Christoph Berg
    Michael Stapelberg
    Bastian Blank
    Sam Morris
    Martin Pitt
Subject: Re: systemd makes / shared by default

Hello all,

there's more fallout from the change of the default that makes bind mounts 
share submounts (as introduced by systemd) instead of the previous default that 
kept them private (as given by the linux kernel).

I have a variety of chroot systems that go like this:

  sudo mount --rbind /dev $CHROOT/dev
  sudo mount -t tmpfs tmpfs $CHROOT/run/shm

  # exec some not very much trusted app such as skype

  sudo umount -l $CHROOT/run/shm
  sudo umount -l $CHROOT/run
  sudo umount -l $CHROOT/dev

This worked under wheezy. Under jessie instead it wrecks havoc to the running 
system in that /dev/shm gets unmounted in the base (parent) system and so a lot 
of stuff stops working (such as my terminal application "konsole", system shut 
down/reboot, chromium etc. etc.).

<opinion>
I *think* that if Debian had a Linus type "benevolent dictator" that dictator 
would at this moment be on a spree to verbally kill people for breaking the 
system's API. I opine that unless the matter of API stability will be taken as 
seriously as Linus does there will be no "year of the Linux desktop" ever, 
since application writers can't be expected to be running around in circles all 
year long fixing "petty" API breaks left and right on every odd Linux 
distribution. And user's can't be expected to be rebuilding their systems from 
scratch and reinstalling all their (custom, proprietary, weird) software from 
new versions every few years just because the base system had an upgrade.
</opinion>

But aside from making my opinion known here, I am unable to offer a remedy 
apart from reverting the default, which would break other software, that 
depends on the *new* default behavior as set by systemd.

So unless someone has a clever idea, I'm just going to document this in the 
Debian wiki.

I think a warning in the release notes would also be appropriate.
*t



Information forwarded to debian-bugs-dist@lists.debian.org, Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>:
Bug#739593; Package systemd. (Fri, 13 Feb 2015 14:21:05 GMT) (full text, mbox, link).


Acknowledgement sent to Tomas Pospisek <tpo@sourcepole.ch>:
Extra info received and forwarded to list. Copy sent to Debian systemd Maintainers <pkg-systemd-maintainers@lists.alioth.debian.org>. (Fri, 13 Feb 2015 14:21:05 GMT) (full text, mbox, link).


Message #102 received at 739593@bugs.debian.org (full text, mbox, reply):

From: Tomas Pospisek <tpo@sourcepole.ch>
To: Christoph Berg <christoph.berg@credativ.de>
Cc: 739593@bugs.debian.org, Michael Stapelberg <stapelberg@debian.org>, Bastian Blank <bastian.blank@credativ.de>, Sam Morris <sam@robots.org.uk>, Martin Pitt <mpitt@debian.org>
Subject: Re: systemd makes / shared by default
Date: Fri, 13 Feb 2015 15:19:08 +0100 (CET)
On Mon, 9 Feb 2015, Christoph Berg wrote:

> Re: Tomas Pospisek 2015-02-08 <alpine.DEB.2.11.1502081748110.2557@hier>
>> Hello all,
>>
>> there's more fallout from the change of the default that makes bind mounts
>> share submounts (as introduced by systemd) instead of the previous default
>> that kept them private (as given by the linux kernel).
>>
>> I have a variety of chroot systems that go like this:
>>
>>   sudo mount --rbind /dev $CHROOT/dev
>>   sudo mount -t tmpfs tmpfs $CHROOT/run/shm
>
> I think you need to execute the above in a "unshare -m" environment to
> get disconnected from the / mount namespace.

That's not sufficient though, you'll still need to sing the special:

  mount --make-rslave (or --make-rprivate)

incantation as documented in the unshare man page. In the end I think 
making "unshare -m" do that magic incantation by itself as considered 
somewhere on the util-linux mailing list (don't have the refernce at hand) 
would be best here.

> The weird part is that you can tweak some "global" options *locally*.
>
> But yes, it's hilarious that we need to take care about this stuff...

The current semantics are really absurd, unexpected and surprising and 
allthough I can understand and agree with (or that's what I believe) 
Lenart's argument for changing it, I think finally the change of default 
was a mistake since it's in crass contrast to the principle of least 
surprise.

As a consequence it makes us all less safe I think, since whatever is done 
inside the bind mount or the changeroot or the unshared namespace will 
have influence on the parent if one forgets to do the extra dance 
to disconnect the mount from the parent.

I'm pondering bringing this up in both d-d and to have it documented in 
the release notes. But currently I simply don't have the time to follow 
through with this.
*t



Bug archived. Request was from Debbugs Internal Request <owner@bugs.debian.org> to internal_control@bugs.debian.org. (Sat, 14 Mar 2015 07:28:34 GMT) (full text, mbox, link).


Send a report that this bug log contains spam.


Debian bug tracking system administrator <owner@bugs.debian.org>. Last modified: Wed Jan 10 13:26:22 2018; Machine Name: beach

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.