You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Weiwei Yang (JIRA)" <ji...@apache.org> on 2018/11/06 02:38:00 UTC

[jira] [Comment Edited] (YARN-8902) Add volume manager that manages CSI volume lifecycle

    [ https://issues.apache.org/jira/browse/YARN-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676053#comment-16676053 ] 

Weiwei Yang edited comment on YARN-8902 at 11/6/18 2:37 AM:
------------------------------------------------------------

Hi [~leftnoteasy]

Thanks for the review comments. Pls see my response below
{quote}CsiAdaptorClient is not implemented, does this patch works end to end?
{quote}
This task is focusing on the RM side changes, the adaptor will be deployed on NM and that will be implemented in YARN-8953. The interface is added here because I've created some fake impls used by UT which can test volume manager functions.
{quote}How to handle client ask volumes for every allocate request (let's say same volume id)? What will the expectation be for users, should they expect failures for the allocate() call or duplicated volume id will be simply ignored?
{quote}
Volume manager tracks all known volume states, see more in {{VolumeStates}} class. If client asks for same volume (by specifying same pre-provisioned volume ID), we just ensure volume is transited to the desire state, AKA {{NODE_READY}} state (which means controller publish is already done). So if volume is new, volume manager will do validation then publish operation; if volume is already published, then no operation is needed.
{quote}How to handle RM recovery case for volumes, are we going to recover volume states? or do we need to do that?
{quote}
Not necessarily, I think we can do this in stateless manner. According to the CSI spec, e.g
{noformat}
ControllerPublishVolume

This operation MUST be idempotent. If the volume corresponding to the {{volume_id}} has already been published at the node corresponding to the {{node_id}}, and is compatible with the specified {{volume_capability}} and {{readonly}} flag, the Plugin MUST reply {{0 OK}}.

{noformat}
that means it allows us to call e.g {{ControllerPublishVolume}} multiple times even a volume is already published. Most of APIs are defined as idempotent. So as in the recovery, we could just reset the volume to new and start all over again, the driver should response OK.

Thanks


was (Author: cheersyang):
Hi [~leftnoteasy]

Thanks for the review comments. Pls see my response below
{quote}CsiAdaptorClient is not implemented, does this patch works end to end?
{quote}
This task is focusing on the RM side changes, the adaptor will be deployed on NM and that will be implemented in YARN-8953. The interface is added here because I've created some fake impls used by UT which can test volume manager functions.
{quote}How to handle client ask volumes for every allocate request (let's say same volume id)? What will the expectation be for users, should they expect failures for the allocate() call or duplicated volume id will be simply ignored?
{quote}
Volume manager tracks all known volume states, see more in {{VolumeStates}} class. If client asks for same volume (by specifying same pre-provisioned volume ID), we just ensure volume is transited to the desire state, AKA {{NODE_READY}} state (which means controller publish is already done). So if volume is new, volume manager will do validation then publish operation; if volume is already published, then no operation is needed.
{quote}How to handle RM recovery case for volumes, are we going to recover volume states? or do we need to do that?
{quote}
Not necessarily, I think we can do this in stateless manner. According to the CSI spec, e.g

{noformat}

ControllerPublishVolume

This operation MUST be idempotent. If the volume corresponding to the {{volume_id}} has already been published at the node corresponding to the {{node_id}}, and is compatible with the specified {{volume_capability}} and {{readonly}} flag, the Plugin MUST reply {{0 OK}}.

{noformat}

that means it allows us to call ControllerPublishVolume multiple times even a volume is already published. Most of APIs are defined as idempotent. So as in the recovery, we could just reset the volume to new and start all over again, the driver should response OK.

Thanks

 

 

 

 

> Add volume manager that manages CSI volume lifecycle
> ----------------------------------------------------
>
>                 Key: YARN-8902
>                 URL: https://issues.apache.org/jira/browse/YARN-8902
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Major
>         Attachments: YARN-8902.001.patch, YARN-8902.002.patch, YARN-8902.003.patch, YARN-8902.004.patch, YARN-8902.005.patch, YARN-8902.006.patch, YARN-8902.007.patch
>
>
> The CSI volume manager is a service running in RM process, that manages all CSI volumes' lifecycle. The details about volume's lifecycle states can be found in [CSI spec|https://github.com/container-storage-interface/spec/blob/master/spec.md]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org