You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Neil Conway (JIRA)" <ji...@apache.org> on 2016/08/25 20:51:21 UTC

[jira] [Comment Edited] (MESOS-6090) Change master to always update registry before in-memory state

    [ https://issues.apache.org/jira/browse/MESOS-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437641#comment-15437641 ] 

Neil Conway edited comment on MESOS-6090 at 8/25/16 8:50 PM:
-------------------------------------------------------------

On removal, we currently do the following:

1. Add to {{removing}}, add to {{removed}}, remove from {{registered}}.
2. Perform registry operation

The proposal is to reorder these operations. As discussed above, updating in-memory state like {{registered}} before the registry operation is successful leaks potentially inaccurate information via the HTTP endpoints. As it happens, it also makes it harder to implement GC for the unreachable list, but that's a separate matter.

My comments above about reregistration are mistaken: the code paths that need to change here are removing slaves and marking them unreachable, not removing slaves and reregistering slaves. I've updated the description.


was (Author: neilc):
On removal, we currently do the following:

1. Add to {{removing}}, add to {{removed}}, remove from {{registered}}.
2. Perform registry operation

The proposal is to reorder these operations. As discussed above, updating in-memory state like {{registered}} before the registry operation is successful leaks potentially inaccurate information via the HTTP endpoints. As it happens, it also makes it harder to implement GC for the unreachable list, but that's a separate matter.

The comments above about reregistration are mistaken: the code paths that need to change here are removing slaves and marking them unreachable, not removing slaves and reregistering slaves. I've updated the description.

> Change master to always update registry before in-memory state
> --------------------------------------------------------------
>
>                 Key: MESOS-6090
>                 URL: https://issues.apache.org/jira/browse/MESOS-6090
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master
>            Reporter: Neil Conway
>            Assignee: Neil Conway
>              Labels: mesosphere
>
> When a new slave attempts to register, the registry is updated first, then the master's in-memory state is updated if the registry operation is applied successfully. However, when a slave is removed or reregisters, the master first updates its in-memory state, then updates the registry. This has two problems:
> 1. It makes it harder to reason about the correctness of concurrent operations that read in-memory state and update the registry.
> 2. It can leak incorrect information via the HTTP endpoints. That is, if we update the master's in-memory state on removal or reregistration, that change will be observable via the HTTP endpoints. If the master then fails over (and the registry operation fails), the information returned via the endpoint will be incorrect. The master has special code to avoid this inaccuracy for reconciliation (see {{Master::transitioning()}}), but not for the endpoints.
> I think it is simpler to just always update the registry first.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)