mount_namespaces(7) - Man pages

dwww Home | Manual pages | Find package
mount_namespaces(7)    Miscellaneous Information Manual    mount_namespaces(7)

NAME
       mount_namespaces - overview of Linux mount namespaces

DESCRIPTION
       For an overview of namespaces, see namespaces(7).

       Mount  namespaces  provide  isolation of the list of mounts seen by the
       processes in each namespace instance.  Thus, the processes in  each  of
       the  mount namespace instances will see distinct single-directory hier-
       archies.

       The views provided by the  /proc/pid/mounts,  /proc/pid/mountinfo,  and
       /proc/pid/mountstats files (all described in proc(5)) correspond to the
       mount namespace in which the process with the PID pid resides.  (All of
       the processes that reside in the same mount namespace will see the same
       view in these files.)

       A new mount namespace is created using either  clone(2)  or  unshare(2)
       with  the CLONE_NEWNS flag.  When a new mount namespace is created, its
       mount list is initialized as follows:

       •  If the namespace is created using clone(2), the mount  list  of  the
          child's  namespace  is  a  copy  of  the  mount  list  in the parent
          process's mount namespace.

       •  If the namespace is created using unshare(2), the mount list of  the
          new  namespace  is a copy of the mount list in the caller's previous
          mount namespace.

       Subsequent modifications to the mount list (mount(2) and umount(2))  in
       either mount namespace will not (by default) affect the mount list seen
       in the other namespace (but see the following discussion of shared sub-
       trees).

SHARED SUBTREES
       After  the implementation of mount namespaces was completed, experience
       showed that the isolation that they provided was, in  some  cases,  too
       great.   For  example,  in  order  to  make a newly loaded optical disk
       available in all mount namespaces, a mount operation  was  required  in
       each namespace.  For this use case, and others, the shared subtree fea-
       ture was introduced in Linux 2.6.15.  This  feature  allows  for  auto-
       matic,  controlled propagation of mount(2) and umount(2) events between
       namespaces (or, more precisely, between the mounts that are members  of
       a peer group that are propagating events to one another).

       Each  mount  is  marked  (via  mount(2)) as having one of the following
       propagation types:

       MS_SHARED
              This mount shares events with members of a peer group.  mount(2)
              and umount(2) events immediately under this mount will propagate
              to the other mounts that are members of the peer group.   Propa-
              gation here means that the same mount(2) or umount(2) will auto-
              matically occur under all of the other mounts in the peer group.
              Conversely,  mount(2) and umount(2) events that take place under
              peer mounts will propagate to this mount.

       MS_PRIVATE
              This mount is private; it does not have a peer group.   mount(2)
              and umount(2) events do not propagate into or out of this mount.

       MS_SLAVE
              mount(2)  and  umount(2) events propagate into this mount from a
              (master) shared peer group.  mount(2) and umount(2) events under
              this mount do not propagate to any peer.

              Note  that  a mount can be the slave of another peer group while
              at the same time sharing mount(2) and umount(2)  events  with  a
              peer  group  of which it is a member.  (More precisely, one peer
              group can be the slave of another peer group.)

       MS_UNBINDABLE
              This is like a private mount, and in addition this  mount  can't
              be  bind  mounted.   Attempts to bind mount this mount (mount(2)
              with the MS_BIND flag) will fail.

              When a recursive bind  mount  (mount(2)  with  the  MS_BIND  and
              MS_REC  flags)  is  performed  on  a directory subtree, any bind
              mounts within the subtree are automatically  pruned  (i.e.,  not
              replicated)  when replicating that subtree to produce the target
              subtree.

       For a discussion of the propagation type assigned to a new  mount,  see
       NOTES.

       The  propagation  type is a per-mount-point setting; some mounts may be
       marked as shared (with each shared mount being a member of  a  distinct
       peer group), while others are private (or slaved or unbindable).

       Note  that  a  mount's propagation type determines whether mount(2) and
       umount(2) of mounts immediately under the mount are propagated.   Thus,
       the  propagation  type does not affect propagation of events for grand-
       children and further removed descendant mounts.  What  happens  if  the
       mount itself is unmounted is determined by the propagation type that is
       in effect for the parent of the mount.

       Members are added to a peer group when a mount is marked as shared  and
       either:

       (a)  the  mount  is replicated during the creation of a new mount name-
            space; or

       (b)  a new bind mount is created from the mount.

       In both of these cases, the new mount joins the peer group of which the
       existing mount is a member.

       A new peer group is also created when a child mount is created under an
       existing mount that is marked as shared.  In this case, the  new  child
       mount is also marked as shared and the resulting peer group consists of
       all the mounts that are replicated under the peers of parent mounts.

       A mount ceases to be a member of a peer group when either the mount  is
       explicitly unmounted, or when the mount is implicitly unmounted because
       a mount namespace is removed (because it has no more member processes).

       The propagation type of the mounts in a mount namespace can be  discov-
       ered  via  the  "optional fields" exposed in /proc/pid/mountinfo.  (See
       proc(5) for details of this file.)  The following tags  can  appear  in
       the optional fields for a record in that file:

       shared:X
              This  mount  is  shared  in peer group X.  Each peer group has a
              unique ID that is automatically generated by the kernel, and all
              mounts in the same peer group will show the same ID.  (These IDs
              are assigned starting from the value 1, and may be recycled when
              a peer group ceases to have any members.)

       master:X
              This mount is a slave to shared peer group X.

       propagate_from:X (since Linux 2.6.26)
              This  mount is a slave and receives propagation from shared peer
              group X.  This tag will always appear in conjunction with a mas-
              ter:X tag.  Here, X is the closest dominant peer group under the
              process's root directory.  If X is the immediate master  of  the
              mount,  or  if  there  is  no dominant peer group under the same
              root, then only the master:X field is present and not the propa-
              gate_from:X field.  For further details, see below.

       unbindable
              This is an unbindable mount.

       If none of the above tags is present, then this is a private mount.

   MS_SHARED and MS_PRIVATE example
       Suppose  that on a terminal in the initial mount namespace, we mark one
       mount as shared and another as private, and then  view  the  mounts  in
       /proc/self/mountinfo:

           sh1# mount --make-shared /mntS
           sh1# mount --make-private /mntP
           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           77 61 8:17 / /mntS rw,relatime shared:1
           83 61 8:15 / /mntP rw,relatime

       From  the  /proc/self/mountinfo  output,  we see that /mntS is a shared
       mount in peer group 1, and that /mntP has no optional tags,  indicating
       that  it  is  a  private mount.  The first two fields in each record in
       this file are the unique ID for this mount, and the  mount  ID  of  the
       parent  mount.  We can further inspect this file to see that the parent
       mount of /mntS and /mntP is the root directory, /, which is mounted  as
       private:

           sh1# cat /proc/self/mountinfo | awk '$1 == 61' | sed 's/ - .*//'
           61 0 8:2 / / rw,relatime

       On  a  second  terminal, we create a new mount namespace where we run a
       second shell and inspect the mounts:

           $ PS1='sh2# ' sudo unshare -m --propagation unchanged sh
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           222 145 8:17 / /mntS rw,relatime shared:1
           225 145 8:15 / /mntP rw,relatime

       The new mount namespace received a copy  of  the  initial  mount  name-
       space's  mounts.  These new mounts maintain the same propagation types,
       but have unique mount IDs.  (The --propagation  unchanged  option  pre-
       vents unshare(1) from marking all mounts as private when creating a new
       mount namespace, which it does by default.)

       In the second terminal, we then create submounts under  each  of  /mntS
       and /mntP and inspect the set-up:

           sh2# mkdir /mntS/a
           sh2# mount /dev/sdb6 /mntS/a
           sh2# mkdir /mntP/b
           sh2# mount /dev/sdb7 /mntP/b
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           222 145 8:17 / /mntS rw,relatime shared:1
           225 145 8:15 / /mntP rw,relatime
           178 222 8:22 / /mntS/a rw,relatime shared:2
           230 225 8:23 / /mntP/b rw,relatime

       From  the above, it can be seen that /mntS/a was created as shared (in-
       heriting this setting from its parent mount) and /mntP/b was created as
       a private mount.

       Returning  to the first terminal and inspecting the set-up, we see that
       the new mount created under the shared mount /mntS  propagated  to  its
       peer  mount (in the initial mount namespace), but the new mount created
       under the private mount /mntP did not propagate:

           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           77 61 8:17 / /mntS rw,relatime shared:1
           83 61 8:15 / /mntP rw,relatime
           179 77 8:22 / /mntS/a rw,relatime shared:2

   MS_SLAVE example
       Making a mount a slave allows it to  receive  propagated  mount(2)  and
       umount(2)  events  from a master shared peer group, while preventing it
       from propagating events to that master.  This is useful if we  want  to
       (say) receive a mount event when an optical disk is mounted in the mas-
       ter shared peer group (in another mount namespace), but want to prevent
       mount(2)  and  umount(2)  events under the slave mount from having side
       effects in other namespaces.

       We can demonstrate the effect of slaving by first marking two mounts as
       shared in the initial mount namespace:

           sh1# mount --make-shared /mntX
           sh1# mount --make-shared /mntY
           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           132 83 8:23 / /mntX rw,relatime shared:1
           133 83 8:22 / /mntY rw,relatime shared:2

       On  a  second terminal, we create a new mount namespace and inspect the
       mounts:

           sh2# unshare -m --propagation unchanged sh
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime shared:2

       In the new mount namespace, we then mark one of the mounts as a slave:

           sh2# mount --make-slave /mntY
           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime master:2

       From the above output, we see that /mntY is now a slave mount  that  is
       receiving propagation events from the shared peer group with the ID 2.

       Continuing  in  the  new  namespace,  we create submounts under each of
       /mntX and /mntY:

           sh2# mkdir /mntX/a
           sh2# mount /dev/sda3 /mntX/a
           sh2# mkdir /mntY/b
           sh2# mount /dev/sda5 /mntY/b

       When we inspect the state of the mounts in the new mount namespace,  we
       see  that  /mntX/a  was  created  as a new shared mount (inheriting the
       "shared" setting from its parent mount) and /mntY/b was  created  as  a
       private mount:

           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime master:2
           173 168 8:3 / /mntX/a rw,relatime shared:3
           175 169 8:5 / /mntY/b rw,relatime

       Returning  to  the  first terminal (in the initial mount namespace), we
       see that the mount /mntX/a propagated to the peer (the  shared  /mntX),
       but the mount /mntY/b was not propagated:

           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           132 83 8:23 / /mntX rw,relatime shared:1
           133 83 8:22 / /mntY rw,relatime shared:2
           174 132 8:3 / /mntX/a rw,relatime shared:3

       Now we create a new mount under /mntY in the first shell:

           sh1# mkdir /mntY/c
           sh1# mount /dev/sda1 /mntY/c
           sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           132 83 8:23 / /mntX rw,relatime shared:1
           133 83 8:22 / /mntY rw,relatime shared:2
           174 132 8:3 / /mntX/a rw,relatime shared:3
           178 133 8:1 / /mntY/c rw,relatime shared:4

       When  we  examine the mounts in the second mount namespace, we see that
       in this case the new mount has been propagated to the slave mount,  and
       that the new mount is itself a slave mount (to peer group 4):

           sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           168 167 8:23 / /mntX rw,relatime shared:1
           169 167 8:22 / /mntY rw,relatime master:2
           173 168 8:3 / /mntX/a rw,relatime shared:3
           175 169 8:5 / /mntY/b rw,relatime
           179 169 8:1 / /mntY/c rw,relatime master:4

   MS_UNBINDABLE example
       One of the primary purposes of unbindable mounts is to avoid the "mount
       explosion" problem when repeatedly performing bind mounts of a  higher-
       level  subtree  at  a lower-level mount.  The problem is illustrated by
       the following shell session.

       Suppose we have a system with the following mounts:

           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY

       Suppose furthermore that we wish to recursively bind mount the root di-
       rectory  under  several  users'  home  directories.  We do this for the
       first user, and inspect the mounts:

           # mount --rbind / /home/cecilia/
           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY

       When we repeat this operation for the second user, we start to see  the
       explosion problem:

           # mount --rbind / /home/henry
           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY
           /dev/sda1 on /home/henry
           /dev/sdb6 on /home/henry/mntX
           /dev/sdb7 on /home/henry/mntY
           /dev/sda1 on /home/henry/home/cecilia
           /dev/sdb6 on /home/henry/home/cecilia/mntX
           /dev/sdb7 on /home/henry/home/cecilia/mntY

       Under  /home/henry,  we  have  not only recursively added the /mntX and
       /mntY mounts, but also the recursive mounts of those directories  under
       /home/cecilia  that  were created in the previous step.  Upon repeating
       the step for a third user, it becomes obvious that the explosion is ex-
       ponential in nature:

           # mount --rbind / /home/otto
           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY
           /dev/sda1 on /home/henry
           /dev/sdb6 on /home/henry/mntX
           /dev/sdb7 on /home/henry/mntY
           /dev/sda1 on /home/henry/home/cecilia
           /dev/sdb6 on /home/henry/home/cecilia/mntX
           /dev/sdb7 on /home/henry/home/cecilia/mntY
           /dev/sda1 on /home/otto
           /dev/sdb6 on /home/otto/mntX
           /dev/sdb7 on /home/otto/mntY
           /dev/sda1 on /home/otto/home/cecilia
           /dev/sdb6 on /home/otto/home/cecilia/mntX
           /dev/sdb7 on /home/otto/home/cecilia/mntY
           /dev/sda1 on /home/otto/home/henry
           /dev/sdb6 on /home/otto/home/henry/mntX
           /dev/sdb7 on /home/otto/home/henry/mntY
           /dev/sda1 on /home/otto/home/henry/home/cecilia
           /dev/sdb6 on /home/otto/home/henry/home/cecilia/mntX
           /dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY

       The  mount  explosion  problem  in the above scenario can be avoided by
       making each of the new mounts unbindable.  The effect of doing this  is
       that  recursive mounts of the root directory will not replicate the un-
       bindable mounts.  We make such a mount for the first user:

           # mount --rbind --make-unbindable / /home/cecilia

       Before going further, we show that unbindable mounts are indeed unbind-
       able:

           # mkdir /mntZ
           # mount --bind /home/cecilia /mntZ
           mount: wrong fs type, bad option, bad superblock on /home/cecilia,
                  missing codepage or helper program, or other error

                  In some cases useful info is found in syslog - try
                  dmesg | tail or so.

       Now we create unbindable recursive bind mounts for the other two users:

           # mount --rbind --make-unbindable / /home/henry
           # mount --rbind --make-unbindable / /home/otto

       Upon  examining  the list of mounts, we see there has been no explosion
       of mounts, because the unbindable mounts were not replicated under each
       user's directory:

           # mount | awk '{print $1, $2, $3}'
           /dev/sda1 on /
           /dev/sdb6 on /mntX
           /dev/sdb7 on /mntY
           /dev/sda1 on /home/cecilia
           /dev/sdb6 on /home/cecilia/mntX
           /dev/sdb7 on /home/cecilia/mntY
           /dev/sda1 on /home/henry
           /dev/sdb6 on /home/henry/mntX
           /dev/sdb7 on /home/henry/mntY
           /dev/sda1 on /home/otto
           /dev/sdb6 on /home/otto/mntX
           /dev/sdb7 on /home/otto/mntY

   Propagation type transitions
       The  following  table  shows the effect that applying a new propagation
       type (i.e., mount --make-xxxx) has on the existing propagation type  of
       a  mount.   The  rows correspond to existing propagation types, and the
       columns are the new propagation settings.  For reasons of space,  "pri-
       vate" is abbreviated as "priv" and "unbindable" as "unbind".

                     make-shared   make-slave      make-priv  make-unbind
       ─────────────┬───────────────────────────────────────────────────────
       shared       │shared        slave/priv [1]  priv       unbind
       slave        │slave+shared  slave [2]       priv       unbind
       slave+shared │slave+shared  slave           priv       unbind
       private      │shared        priv [2]        priv       unbind
       unbindable   │shared        unbind [2]      priv       unbind

       Note the following details to the table:

       [1] If  a shared mount is the only mount in its peer group, making it a
           slave automatically makes it private.

       [2] Slaving a nonshared mount has no effect on the mount.

   Bind (MS_BIND) semantics
       Suppose that the following command is performed:

           mount --bind A/a B/b

       Here, A is the source mount, B is the destination mount, a is a  subdi-
       rectory  path under the mount point A, and b is a subdirectory path un-
       der the mount point B.  The propagation type of  the  resulting  mount,
       B/b,  depends  on  the  propagation types of the mounts A and B, and is
       summarized in the following table.

                                  source(A)
                          shared  private    slave         unbind
       ──────────────────┬──────────────────────────────────────────
       dest(B)  shared   │shared  shared     slave+shared  invalid
                nonshared│shared  private    slave         invalid

       Note that a recursive bind of a subtree follows the same  semantics  as
       for  a bind operation on each mount in the subtree.  (Unbindable mounts
       are automatically pruned at the target mount point.)

       For further details, see Documentation/filesystems/sharedsubtree.rst in
       the kernel source tree.

   Move (MS_MOVE) semantics
       Suppose that the following command is performed:

           mount --move A B/b

       Here,  A  is  the  source mount, B is the destination mount, and b is a
       subdirectory path under the mount point B.  The propagation type of the
       resulting  mount, B/b, depends on the propagation types of the mounts A
       and B, and is summarized in the following table.

                                  source(A)
                          shared  private    slave         unbind
       ──────────────────┬─────────────────────────────────────────────
       dest(B)  shared   │shared  shared     slave+shared  invalid
                nonshared│shared  private    slave         unbindable

       Note: moving a mount that resides under a shared mount is invalid.

       For further details, see Documentation/filesystems/sharedsubtree.rst in
       the kernel source tree.

   Mount semantics
       Suppose that we use the following command to create a mount:

           mount device B/b

       Here,  B  is  the destination mount, and b is a subdirectory path under
       the mount point B.  The propagation type of the resulting  mount,  B/b,
       follows  the same rules as for a bind mount, where the propagation type
       of the source mount is considered always to be private.

   Unmount semantics
       Suppose that we use the following command to tear down a mount:

           umount A

       Here, A is a mount on B/b, where B is the parent mount and b is a  sub-
       directory path under the mount point B.  If B is shared, then all most-
       recently-mounted mounts at b on mounts that  receive  propagation  from
       mount B and do not have submounts under them are unmounted.

   The /proc/ pid /mountinfo propagate_from tag
       The  propagate_from:X  tag  is  shown  in  the  optional  fields  of  a
       /proc/pid/mountinfo record in cases where a process can't see a slave's
       immediate  master  (i.e.,  the  pathname of the master is not reachable
       from the filesystem root directory) and so cannot determine  the  chain
       of propagation between the mounts it can see.

       In the following example, we first create a two-link master-slave chain
       between the mounts /mnt, /tmp/etc,  and  /mnt/tmp/etc.   Then  the  ch-
       root(1)  command  is  used to make the /tmp/etc mount point unreachable
       from the root directory, creating  a  situation  where  the  master  of
       /mnt/tmp/etc  is  not  reachable  from  the (new) root directory of the
       process.

       First, we bind mount the root directory onto /mnt and then  bind  mount
       /proc  at  /mnt/proc  so  that  after  the  later chroot(1) the proc(5)
       filesystem remains visible at the correct location in the chroot-ed en-
       vironment.

           # mkdir -p /mnt/proc
           # mount --bind / /mnt
           # mount --bind /proc /mnt/proc

       Next,  we  ensure  that  the /mnt mount is a shared mount in a new peer
       group (with no peers):

           # mount --make-private /mnt  # Isolate from any previous peer group
           # mount --make-shared /mnt
           # cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5

       Next, we bind mount /mnt/etc onto /tmp/etc:

           # mkdir -p /tmp/etc
           # mount --bind /mnt/etc /tmp/etc
           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5
           267 40 8:2 /etc /tmp/etc ... shared:102

       Initially, these two mounts are in the same peer  group,  but  we  then
       make the /tmp/etc a slave of /mnt/etc, and then make /tmp/etc shared as
       well, so that it can propagate events to the next slave in the chain:

           # mount --make-slave /tmp/etc
           # mount --make-shared /tmp/etc
           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5
           267 40 8:2 /etc /tmp/etc ... shared:105 master:102

       Then we bind mount /tmp/etc onto /mnt/tmp/etc.  Again, the  two  mounts
       are  initially  in the same peer group, but we then make /mnt/tmp/etc a
       slave of /tmp/etc:

           # mkdir -p /mnt/tmp/etc
           # mount --bind /tmp/etc /mnt/tmp/etc
           # mount --make-slave /mnt/tmp/etc
           # cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
           239 61 8:2 / /mnt ... shared:102
           248 239 0:4 / /mnt/proc ... shared:5
           267 40 8:2 /etc /tmp/etc ... shared:105 master:102
           273 239 8:2 /etc /mnt/tmp/etc ... master:105

       From the above, we see that /mnt is the master of the  slave  /tmp/etc,
       which in turn is the master of the slave /mnt/tmp/etc.

       We  then  chroot(1) to the /mnt directory, which renders the mount with
       ID 267 unreachable from the (new) root directory:

           # chroot /mnt

       When we examine the state of the mounts inside the  chroot-ed  environ-
       ment, we see the following:

           # cat /proc/self/mountinfo | sed 's/ - .*//'
           239 61 8:2 / / ... shared:102
           248 239 0:4 / /proc ... shared:5
           273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102

       Above, we see that the mount with ID 273 is a slave whose master is the
       peer group 105.  The mount point for that master is unreachable, and so
       a propagate_from tag is displayed, indicating that the closest dominant
       peer group (i.e., the nearest reachable mount in the  slave  chain)  is
       the  peer  group with the ID 102 (corresponding to the /mnt mount point
       before the chroot(1) was performed).

VERSIONS
       Mount namespaces first appeared in Linux 2.4.19.

STANDARDS
       Namespaces are a Linux-specific feature.

NOTES
       The propagation type assigned to a new mount depends on the propagation
       type  of  the  parent  mount.  If the mount has a parent (i.e., it is a
       non-root mount point)  and  the  propagation  type  of  the  parent  is
       MS_SHARED,  then  the  propagation  type  of  the  new  mount  is  also
       MS_SHARED.  Otherwise, the propagation type of the new mount is MS_PRI-
       VATE.

       Notwithstanding  the  fact  that  the  default propagation type for new
       mount is in many cases MS_PRIVATE, MS_SHARED is typically more  useful.
       For  this  reason,  systemd(1)  automatically  remounts  all  mounts as
       MS_SHARED on system startup.  Thus, on most modern systems, the default
       propagation type is in practice MS_SHARED.

       Since,  when  one uses unshare(1) to create a mount namespace, the goal
       is commonly to provide full isolation of the mounts in  the  new  name-
       space, unshare(1) (since util-linux
        2.27) in turn reverses the step performed by systemd(1), by making all
       mounts private in the new namespace.  That is, unshare(1) performs  the
       equivalent of the following in the new mount namespace:

           mount --make-rprivate /

       To  prevent this, one can use the --propagation unchanged option to un-
       share(1).

       An application that  creates  a  new  mount  namespace  directly  using
       clone(2)  or  unshare(2)  may  desire  to  prevent propagation of mount
       events to other mount namespaces (as is done by unshare(1)).  This  can
       be done by changing the propagation type of mounts in the new namespace
       to either MS_SLAVE or MS_PRIVATE, using a call such as the following:

           mount(NULL, "/", MS_SLAVE | MS_REC, NULL);

       For a discussion of propagation types when moving mounts (MS_MOVE)  and
       creating  bind  mounts (MS_BIND), see Documentation/filesystems/shared-
       subtree.rst.

   Restrictions on mount namespaces
       Note the following points with respect to mount namespaces:

       [1] Each mount namespace has an owner  user  namespace.   As  explained
           above,  when  a  new  mount namespace is created, its mount list is
           initialized as a copy of the mount list of another mount namespace.
           If  the  new  namespace and the namespace from which the mount list
           was copied are owned by different user  namespaces,  then  the  new
           mount namespace is considered less privileged.

       [2] When  creating a less privileged mount namespace, shared mounts are
           reduced to slave mounts.  This ensures that mappings  performed  in
           less  privileged mount namespaces will not propagate to more privi-
           leged mount namespaces.

       [3] Mounts that come as a single unit  from  a  more  privileged  mount
           namespace  are  locked  together and may not be separated in a less
           privileged mount namespace.  (The unshare(2) CLONE_NEWNS  operation
           brings  across  all of the mounts from the original mount namespace
           as a single unit, and recursive mounts that propagate between mount
           namespaces propagate as a single unit.)

           In  this  context, "may not be separated" means that the mounts are
           locked so that they may not be  individually  unmounted.   Consider
           the following example:

               $ sudo sh
               # mount --bind /dev/null /etc/shadow
               # cat /etc/shadow       # Produces no output

           The  above  steps,  performed in a more privileged mount namespace,
           have created a bind mount that obscures the contents of the  shadow
           password file, /etc/shadow.  For security reasons, it should not be
           possible to umount(2) that mount in a less privileged  mount  name-
           space, since that would reveal the contents of /etc/shadow.

           Suppose  we  now  create  a new mount namespace owned by a new user
           namespace.  The new mount namespace will inherit copies of  all  of
           the  mounts  from  the  previous  mount  namespace.  However, those
           mounts will be locked because the new mount namespace is less priv-
           ileged.   Consequently,  an attempt to umount(2) the mount fails as
           show in the following step:

               # unshare --user --map-root-user --mount \
                              strace -o /tmp/log \
                              umount /mnt/dir
               umount: /etc/shadow: not mounted.
               # grep '^umount' /tmp/log
               umount2("/etc/shadow", 0)     = -1 EINVAL (Invalid argument)

           The error message from mount(8) is  a  little  confusing,  but  the
           strace(1) output reveals that the underlying umount2(2) system call
           failed with the error EINVAL, which is the error  that  the  kernel
           returns to indicate that the mount is locked.

           Note,  however,  that it is possible to stack (and unstack) a mount
           on top of one of the inherited locked mounts in a  less  privileged
           mount namespace:

               # echo 'aaaaa' > /tmp/a    # File to mount onto /etc/shadow
               # unshare --user --map-root-user --mount \
                   sh -c 'mount --bind /tmp/a /etc/shadow; cat /etc/shadow'
               aaaaa
               # umount /etc/shadow

           The  final  umount(8) command above, which is performed in the ini-
           tial mount namespace, makes the original /etc/shadow file once more
           visible in that namespace.

       [4] Following  on from point [3], note that it is possible to umount(2)
           an entire subtree of mounts that propagated as a unit into  a  less
           privileged  mount  namespace, as illustrated in the following exam-
           ple.

           First, we create new user and mount  namespaces  using  unshare(1).
           In  the  new mount namespace, the propagation type of all mounts is
           set to private.  We then create a shared bind mount at /mnt, and  a
           small hierarchy of mounts underneath that mount.

               $ PS1='ns1# ' sudo unshare --user --map-root-user \
                                      --mount --propagation private bash
               ns1# echo $$        # We need the PID of this shell later
               778501
               ns1# mount --make-shared --bind /mnt /mnt
               ns1# mkdir /mnt/x
               ns1# mount --make-private -t tmpfs none /mnt/x
               ns1# mkdir /mnt/x/y
               ns1# mount --make-private -t tmpfs none /mnt/x/y
               ns1# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               986 83 8:5 /mnt /mnt rw,relatime shared:344
               989 986 0:56 / /mnt/x rw,relatime
               990 989 0:57 / /mnt/x/y rw,relatime

           Continuing in the same shell session, we then create a second shell
           in a new user namespace and a new (less privileged) mount namespace
           and check the state of the propagated mounts rooted at /mnt.

               ns1# PS1='ns2# ' unshare --user --map-root-user \
                                      --mount --propagation unchanged bash
               ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               1239 1204 8:5 /mnt /mnt rw,relatime master:344
               1240 1239 0:56 / /mnt/x rw,relatime
               1241 1240 0:57 / /mnt/x/y rw,relatime

           Of  note  in  the  above output is that the propagation type of the
           mount /mnt has been reduced to slave, as explained  in  point  [2].
           This means that submount events will propagate from the master /mnt
           in "ns1", but propagation will not occur in the opposite direction.

           From a separate terminal window, we then use  nsenter(1)  to  enter
           the mount and user namespaces corresponding to "ns1".  In that ter-
           minal window, we then recursively bind mount /mnt/x at the location
           /mnt/ppp.

               $ PS1='ns3# ' sudo nsenter -t 778501 --user --mount
               ns3# mount --rbind --make-private /mnt/x /mnt/ppp
               ns3# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               986 83 8:5 /mnt /mnt rw,relatime shared:344
               989 986 0:56 / /mnt/x rw,relatime
               990 989 0:57 / /mnt/x/y rw,relatime
               1242 986 0:56 / /mnt/ppp rw,relatime
               1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518

           Because the propagation type of the parent mount, /mnt, was shared,
           the recursive bind mount propagated a small subtree of mounts under
           the  slave  mount  /mnt into "ns2", as can be verified by executing
           the following command in that shell session:

               ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
               1239 1204 8:5 /mnt /mnt rw,relatime master:344
               1240 1239 0:56 / /mnt/x rw,relatime
               1241 1240 0:57 / /mnt/x/y rw,relatime
               1244 1239 0:56 / /mnt/ppp rw,relatime
               1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518

           While it is not possible to umount(2) a part of the propagated sub-
           tree  (/mnt/ppp/y) in "ns2", it is possible to umount(2) the entire
           subtree, as shown by the following commands:

               ns2# umount /mnt/ppp/y
               umount: /mnt/ppp/y: not mounted.
               ns2# umount -l /mnt/ppp | sed 's/ - .*//'      # Succeeds...
               ns2# grep /mnt /proc/self/mountinfo
               1239 1204 8:5 /mnt /mnt rw,relatime master:344
               1240 1239 0:56 / /mnt/x rw,relatime
               1241 1240 0:57 / /mnt/x/y rw,relatime

       [5] The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the "atime"
           flags  (MS_NOATIME,  MS_NODIRATIME,  MS_RELATIME)  settings  become
           locked when propagated from a more privileged to a less  privileged
           mount  namespace,  and  may  not  be changed in the less privileged
           mount namespace.

           This point is illustrated in the following example where, in a more
           privileged  mount  namespace, we create a bind mount that is marked
           as read-only.  For security reasons, it should not be  possible  to
           make  the  mount writable in a less privileged mount namespace, and
           indeed the kernel prevents this:

               $ sudo mkdir /mnt/dir
               $ sudo mount --bind -o ro /some/path /mnt/dir
               $ sudo unshare --user --map-root-user --mount \
                              mount -o remount,rw /mnt/dir
               mount: /mnt/dir: permission denied.

       [6] A file or directory that is a mount point in one namespace that  is
           not  a  mount point in another namespace, may be renamed, unlinked,
           or removed (rmdir(2)) in the mount namespace in which it is  not  a
           mount  point  (subject  to  the  usual  permission checks).  Conse-
           quently, the mount point is removed in the mount namespace where it
           was a mount point.

           Previously  (before  Linux  3.18), attempting to unlink, rename, or
           remove a file or directory that was a mount point in another  mount
           namespace would result in the error EBUSY.  That behavior had tech-
           nical problems of enforcement (e.g., for NFS) and permitted denial-
           of-service  attacks against more privileged users (i.e., preventing
           individual files from being updated by  bind  mounting  on  top  of
           them).

EXAMPLES
       See pivot_root(2).

SEE ALSO
       unshare(1),   clone(2),   mount(2),   mount_setattr(2),  pivot_root(2),
       setns(2), umount(2),  unshare(2),  proc(5),  namespaces(7),  user_name-
       spaces(7),   findmnt(8),   mount(8),  pam_namespace(8),  pivot_root(8),
       umount(8)

       Documentation/filesystems/sharedsubtree.rst in the kernel source tree.

Linux man-pages 6.03              2023-02-10               mount_namespaces(7)
Generated by dwww version 1.15 on Fri Jun 21 07:20:00 CEST 2024.