You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Adam B (JIRA)" <ji...@apache.org> on 2014/08/07 09:17:12 UTC

[jira] [Commented] (MESOS-703) master fails to respect updated FrameworkInfo when the framework scheduler restarts

    [ https://issues.apache.org/jira/browse/MESOS-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088930#comment-14088930 ] 

Adam B commented on MESOS-703:
------------------------------

Alternate workaround: (Re)Register the framework without passing the old FrameworkID. Then the framework will register as a new framework and get a new FrameworkID, and there will be no old FrameworkInfo to override the new FrameworkInfo.

Updating FrameworkInfo on a framework with no currently running tasks should be trivial, but things get tricky when there are tasks already running using the old FrameworkInfo. Should tasks running as the old user be killed and restarted as the new user? Should tasks running as the old role now be accounted in the new role? How does this impact fair sharing? What if the tasks were using resources reserved only for the old role, should they be killed? What if framework-checkpointing or the principal changed?

> master fails to respect updated FrameworkInfo when the framework scheduler restarts
> -----------------------------------------------------------------------------------
>
>                 Key: MESOS-703
>                 URL: https://issues.apache.org/jira/browse/MESOS-703
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.14.0
>         Environment: ubuntu 13.04, mesos 0.14.0-rc3
>            Reporter: Jordan Curzon
>
> When I first ran marathon it was running as a personal user and registered with mesos-master as such due to putting an empty string in the user field. When I restarted marathon as "nobody", tasks were still being run as the personal user which didn't exist on the slaves. I know marathon was trying to send a FrameworkInfo with nobody listed as the user because I hard coded it in. The tasks wouldn't run as "nobody" until I restarted the mesos-master. Each time I restarted the marathon framework, it reregistered with mesos-master and mesos-master wrote to the logs that it detected a failover because the scheduler went away and then came back.
> I understand the scheduler failover, but shouldn't mesos-master respect an updated FrameworkInfo when the scheduler re-registers?



--
This message was sent by Atlassian JIRA
(v6.2#6252)