You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Alexander Rukletsov (JIRA)" <ji...@apache.org> on 2017/09/01 08:46:00 UTC
[jira] [Updated] (MESOS-7872) Scheduler hang when registration
fails.
[ https://issues.apache.org/jira/browse/MESOS-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Rukletsov updated MESOS-7872:
---------------------------------------
Summary: Scheduler hang when registration fails. (was: Scheduler hang when registration fails (due to bad role))
> Scheduler hang when registration fails.
> ---------------------------------------
>
> Key: MESOS-7872
> URL: https://issues.apache.org/jira/browse/MESOS-7872
> Project: Mesos
> Issue Type: Bug
> Components: scheduler driver
> Affects Versions: 1.4.0
> Reporter: Till Toenshoff
> Assignee: Alexander Rukletsov
> Labels: framework, reliability, scheduler
> Fix For: 1.5.0
>
>
> I'm finding that if framework registration fails, the mesos driver client will hang indefinitely with the following output:
> {noformat}
> I0809 20:04:22.479391 73 sched.cpp:1187] Got error ''FrameworkInfo.role' is not a valid role: Role '/test/role/slashes' cannot start with a slash'
> I0809 20:04:22.479658 73 sched.cpp:2055] Asked to abort the driver
> I0809 20:04:22.479843 73 sched.cpp:1233] Aborting framework
> {noformat}
> I'd have expected one or both of the following:
> - SchedulerDriver.run() should have exited with a failed Proto.Status of some form
> - Scheduler.error() should have been invoked when the "Got error" occurred
> Steps to reproduce:
> - Launch a scheduler instance, have it register with a known-bad framework info. In this case a role containing slashes was used
> - Observe that the scheduler continues in a TASK_RUNNING state despite the failed registration. From all appearances it looks like the Scheduler implementation isn't invoked at all
> I'd guess that because this failure happens before framework registration, there's some error handling that isn't fully initialized at this point.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)