You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2015/11/04 18:22:27 UTC

[jira] [Commented] (MESOS-3388) Add an interface to allow Slave Modules to checkpoint/restore state.

    [ https://issues.apache.org/jira/browse/MESOS-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989964#comment-14989964 ] 

Greg Mann commented on MESOS-3388:
----------------------------------

Regarding Agent restart, I'm trying to decide if it makes sense for us to garbage collect the checkpointed state of undetected modules on Agent startup. On the one hand, it's good to leave the Agent in a clean state whenever we can. On the other, it's possible that a user may restart the Agent multiple times with different modules present, and it could be useful for them to have old checkpointed module data hanging around. If our long-term vision is that Agent restart should be a seldom-used operator action, then perhaps garbage collecting old module checkpoint data isn't such a big deal. If we imagine Agents being restarted frequently in order to accomplish different Attribute/Resource/Module configurations, then cleanup would be wise.

Regarding module UIDs, how will we maintain association of a given module with its ID through an Agent failover or restart? i.e., if we assign a module a UID, checkpoint some state, and then restart the Agent, how do we know what that module's UID was? Perhaps we could use a hash on the module name?

> Add an interface to allow Slave Modules to checkpoint/restore state.
> --------------------------------------------------------------------
>
>                 Key: MESOS-3388
>                 URL: https://issues.apache.org/jira/browse/MESOS-3388
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Kapil Arya
>            Assignee: Greg Mann
>
> * This is to restore module-specific in-memory data structures that might be required by the modules to do cleanup on task exit, etc.
> * We need to define the interaction when an Agent is restarted with a different set of modules.
> One open question is how does an Agent identify a certain module? One possibility is to assign a UID to the module and pass it in during `create()`?. The UID is used to assign a ckpt directory during ckpt/restore. (Something like /tmp/mesos/.../<slaveID>/modules/<module UID>).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)