You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Rob Johnson (JIRA)" <ji...@apache.org> on 2017/09/12 16:38:00 UTC

[jira] [Updated] (MESOS-7966) check for maintenance on agent causes fatal error

     [ https://issues.apache.org/jira/browse/MESOS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Johnson updated MESOS-7966:
-------------------------------
    Summary: check for maintenance on agent causes fatal error  (was: check for maintenance on slave causes fatal error)

> check for maintenance on agent causes fatal error
> -------------------------------------------------
>
>                 Key: MESOS-7966
>                 URL: https://issues.apache.org/jira/browse/MESOS-7966
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.1.0
>            Reporter: Rob Johnson
>
> We interact with the Operator API frequently to orchestrate gracefully draining agents of tasks without impacting service availability.
> Occasionally we seem to trigger a fatal error in Mesos when interacting with the mesos api. This happens relatively frequently, and impacts us when downstream frameworks (marathon) react badly to leader elections.
> Here is the log line that we see when the master dies:
> {code}
> F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: slaves[slaveId].maintenance.isSome()
> {code}
> It's quite possibly we're using the maintenance API in the wrong way. We're happy to provide any other logs you need - please let me know what would be useful for debugging.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)