You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by "Joe Smith (JIRA)" <ji...@apache.org> on 2012/10/25 03:44:12 UTC

[jira] [Commented] (MESOS-295) Registrar to persist global state

    [ https://issues.apache.org/jira/browse/MESOS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483814#comment-13483814 ] 

Joe Smith commented on MESOS-295:
---------------------------------

To clarify, this is to prevent a situation that can cause tasks to go irrevocably lost.

There should be a persistent datastore (I've heard benh call it a registrar) that saves the current state of the cluster. This will allow a new master to come up and have knowledge of the previous state of the cluster (including slaves that were previously thought lost) so it can take the right corrective action if/when they come back online.
                
> Registrar to persist global state 
> ----------------------------------
>
>                 Key: MESOS-295
>                 URL: https://issues.apache.org/jira/browse/MESOS-295
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Joe Smith
>
> 1) machines lose network connectivity and their tasks are marked LOST
> 2) master failover to a new master
> 3) machines come back up
> 4) tasks stay running and aren't killed, taking up resources

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira