You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Chun-Hung Hsiao (JIRA)" <ji...@apache.org> on 2018/10/30 21:48:00 UTC

[jira] [Commented] (MESOS-8507) SLRP discards reservations when the agent is discarded, which could lead to leaked volumes.

    [ https://issues.apache.org/jira/browse/MESOS-8507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16669372#comment-16669372 ] 

Chun-Hung Hsiao commented on MESOS-8507:
----------------------------------------

[~xujyan] We revisited this issue recently. The current decoupling of reservation and agent ID is a legacy issue. Conceptually, each distinct agent ID is a new, different agent to Mesos, and so are its resources, so it does not make sense to keep an old reservation on "new" resources. We have this unfortunate legacy decoupling because we didn't have a proper way to preserve agent IDs at the time persistent volumes are implemented. The situation has been better now, but we're not there yet, as you mentioned that there's still some cases where we need to discard an agent ID.

The current solution we (Mesosphere) have is to use the "default_reservation" field in the resource provider info to reserve pre-existing volumes to a certain role, and a special framework can register with that role to do some recovery.

Alternatives include adding CSI credential support on `NodePublishVolume` calls, so the CSI plugin can authorize if a certain workload is allowed to use the volume. This will eventually be supported, but currently not prioritized yet.

We could still work around by somehow preserve the reservation, but I'd rather avoid going this route as it's conflicting with the long-term direction we'd like to move forward to. WDYT?

cc [~jieyu]

> SLRP discards reservations when the agent is discarded, which could lead to leaked volumes.
> -------------------------------------------------------------------------------------------
>
>                 Key: MESOS-8507
>                 URL: https://issues.apache.org/jira/browse/MESOS-8507
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yan Xu
>            Priority: Major
>              Labels: storage
>
> In the current SLRP implementation the reservations for new SLRP/CSI backed volumes are checkpointed under {{<meta>/slaves/latest/resource_providers}} so when the agent runs into incompatible configuration changes (the kinds that cannot be addressed by MESOS-1739), the operator has to remove the symlink and then the reservations are gone. 
> Then the agent recovers with a new {{SlaveInfo}} and new SLRPs are created to recover the CSI volumes. These CSI volumes will not have reservations and thus will be offered to frameworks of any role, potentially with the data already written by the previous owner. 
>  
> The framework doesn't have any control over this and any chance to clean up before the volumes are re-offered, which is undesired for security reasons.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)