You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Rob Johnson (JIRA)" <ji...@apache.org> on 2017/09/12 16:37:02 UTC

[jira] [Created] (MESOS-7966) check for maintenance on slave causes fatal error

Rob Johnson created MESOS-7966:
----------------------------------

             Summary: check for maintenance on slave causes fatal error
                 Key: MESOS-7966
                 URL: https://issues.apache.org/jira/browse/MESOS-7966
             Project: Mesos
          Issue Type: Bug
          Components: master
    Affects Versions: 1.1.0
            Reporter: Rob Johnson


We interact with the Operator API frequently to orchestrate gracefully draining agents of tasks without impacting service availability.

Occasionally we seem to trigger a fatal error in Mesos when interacting with the mesos api. This happens relatively frequently, and impacts us when downstream frameworks (marathon) react badly to leader elections.

Here is the log line that we see when the master dies:

{code}
F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: slaves[slaveId].maintenance.isSome()
{code}

It's quite possibly we're using the maintenance API in the wrong way. We're happy to provide any other logs you need - please let me know what would be useful for debugging.

Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)