You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "James Peach (JIRA)" <ji...@apache.org> on 2018/10/08 18:51:00 UTC

[jira] [Commented] (MESOS-9300) XFS isolator can mislabel project IDs on persistence volumes.

    [ https://issues.apache.org/jira/browse/MESOS-9300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16642305#comment-16642305 ] 

James Peach commented on MESOS-9300:
------------------------------------

MacOS has [ATTR_DIR_MOUNTSTATUS|https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/getattrlist.2.html#//apple_ref/doc/man/2/getattrlist], but AFAIK there's not a straight-forward equivalent on Linux.

However like we can detect this on Linux with [EXDEV rename trick|http://blog.schmorp.de/2016-03-03-detecting-a-mount-point.html]

> XFS isolator can mislabel project IDs on persistence volumes.
> -------------------------------------------------------------
>
>                 Key: MESOS-9300
>                 URL: https://issues.apache.org/jira/browse/MESOS-9300
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>            Reporter: James Peach
>            Assignee: James Peach
>            Priority: Major
>
> What happens here is that we are erroneously applying the sandbox's project ID to the persistent volume.
> First, the filesystem/linux isolator bind mounts the persistent volume into the sandbox:
> {noformat}
> I1003 06:49:21.907644 2812466 linux.cpp:593] Mounting '/srv/mesos/work/volumes/roles/pie.mobius/21cb2eb6-b3e5-46f2-944e-8f6e5db9f07f' to '/srv/mesos/work/slaves/909cff92-8e17-41bf-a251-9b5eb6186c35-S0/frameworks/363e6d80-8c38-46cf-815f-2fbf60a62628-0309/executors/mobius-mloop-1538549013_438156792-v2-shared-volume.pod1.writer-job.0.e93hs3uips2i9_1/runs/9e5770a7-9f78-46dc-9264-3e80be0e40cc/shared' for persistent volume disk(allocated: pie.mobius)(reservations: [(DYNAMIC,pie.mobius,jarvis-principal,\{podInstance: e93hs3uips2i9, pod: pod1, service: mobius-mloop-1538549013_438156792-v2-shared-volume})])[21cb2eb6-b3e5-46f2-944e-8f6e5db9f07f:shared]<SHARED>:1 of container 9e5770a7-9f78-46dc-9264-3e80be0e40cc
> {noformat}
> Next, the `disk/xfs` isolator assigns a project ID to the sandbox:
> {noformat}
> I1003 06:49:21.920197 2812452 disk.cpp:402] Assigned project 6806 to '/srv/mesos/work/slaves/909cff92-8e17-41bf-a251-9b5eb6186c35-S0/frameworks/363e6d80-8c38-46cf-815f-2fbf60a62628-0309/executors/mobius-mloop-1538549013_438156792-v2-shared-volume.pod1.writer-job.0.e93hs3uips2i9_1/runs/9e5770a7-9f78-46dc-9264-3e80be0e40cc'
> {noformat}
> Note, that when this happens, the isolator recursively applies the project ID to the contents of the sandbox. It doesn't follow symlinks or cross devices when it does this, but on Linux, a bind mount would not trigger either of these conditions.
> Finally, the `disk/xfs` isolator tries to assign a project ID to the persistent volume as it is used by the task:
> {noformat}
> F1003 06:49:21.920577 2812452 disk.cpp:532] Check failed: scheduledProjects.contains(projectId.get()) untracked project ID 6806 for volume ID 21cb2eb6-b3e5-46f2-944e-8f6e5db9f07f on /srv/mesos/work/volumes/roles/pie.mobius/21cb2eb6-b3e5-46f2-944e-8f6e5db9f07f
> {noformat}
> This check fails, because if the persistent volume has a project ID, we expect that is had already be scheduled for reclaimation. However, it's project ID is the one we assigned to the sandbox. We don't scheduled the ssandbox for reclaimation until cleanup, so (fortunately) the invariant check triggers.
> So, apart from triggering the CHECK, the root cause of this is that we are altering the project ID of the persistent volume, which permanently misattributes the corresponding quote.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)