You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Christopher Hunt (JIRA)" <ji...@apache.org> on 2016/12/28 23:29:59 UTC

[jira] [Commented] (MESOS-6252) Do not validate start command when re-establishing connection to executor

    [ https://issues.apache.org/jira/browse/MESOS-6252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15783979#comment-15783979 ] 

Christopher Hunt commented on MESOS-6252:
-----------------------------------------

I believe that it is reasonable that command line arguments can change between executions. Arguments can pertain to the state of the scheduler at the time a new executor is required. In our scenario, we wish to communicate a "seed node" to the executor such that it may connect. This seed node is communicated as an IP address, which can indeed change between invocations. Note that once the seed node communication is established in our world, then gossiping of these seed nodes occur and this argument becomes largely irrelevant. However, should a new executor require establishing for some reason then e.g. the old ones have somehow died, then the seed node argument once again becomes relevant.

Perhaps as a compromise, only the first argument of a command should be compared by Mesos i.e. the path to the command itself should be compared thereby accepting that a command's arguments are variable.

> Do not validate start command when re-establishing connection to executor
> -------------------------------------------------------------------------
>
>                 Key: MESOS-6252
>                 URL: https://issues.apache.org/jira/browse/MESOS-6252
>             Project: Mesos
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 0.28.1
>         Environment: coreos
>            Reporter: Markus Jura
>
> When a framework re-connects to an existing executor then Mesos is checking if the new start command of the {{ExecutorInfo}} equals the old start command. 
> In case of the ConductR framework, these start command can be different due to a different value in the ConductR agent argument {{--core-node}}.
> As a result, Mesos master is sending a {{TASK_ERROR}} for each running task to the framework. The reason of the error is {{REASON_TASK_INVALID}}.
> {code}
> 2016-09-26T11:34:48Z ip-10-0-0-248.us-west-2.compute.internal ERROR MesosSchedulerClient [sourceThread=stop-all-bundles-1-akka.actor.default-dispatcher-22, akkaTimestamp=11:34:48.713UTC, akkaSource=akka.tcp://stop-all-bundles-1@10.0.0.248:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client, sourceActorSystem=stop-all-bundles-1] - Unexpected Mesos task state TASK_ERROR received by the scheduler: task_id {
>   value: "fe65b273-61c1-4ccf-8852-bb04e2dd9380"
> }
> state: TASK_ERROR
> message: "Task has invalid ExecutorInfo (existing ExecutorInfo with same ExecutorID is not compatible).\n------------------------------------------------------------\nExisting ExecutorInfo:\nexecutor_id {\n  value: \"conductr-node-10.0.0.249-executor\"\n}\nresources {\n  name: \"cpus\"\n  type: SCALAR\n  scalar {\n    value: 0.9\n  }\n  role: \"*\"\n}\nresources {\n  name: \"mem\"\n  type: SCALAR\n  scalar {\n    value: 402.653184\n  }\n  role: \"*\"\n}\nresources {\n  name: \"disk\"\n  type: SCALAR\n  scalar {\n    value: 1000\n  }\n  role: \"*\"\n}\nresources {\n  name: \"ports\"\n  type: RANGES\n  ranges {\n    range {\n      begin: 2552\n      end: 2552\n    }\n    range {\n      begin: 10000\n      end: 10999\n    }\n  }\n  role: \"*\"\n}\ncommand {\n  uris {\n    value: \"https://downloads.mesosphere.com/java/jre-8u92-linux-x64.tar.gz\"\n    executable: false\n    extract: true\n    cache: false\n  }\n  uris {\n    value: \"http://10.0.7.185/ConductR/markusjura/conductr-agent-0.1.0.tgz\"\n    executable: false\n    extract: true\n    cache: false\n  }\n  value: \"GLOBIGNORE=\\\'*.tar.gz:*.tgz\\\' && export JAVA_HOME=$(echo $(pwd)/jre*) && ./conductr-agent-*/bin/conductr-agent -Dconfig.resource=mesos.conf -Dakka.loglevel=DEBUG -Dakka.remote.netty.tcp.port=2552 -Dconductr-agent.run.allocated-ports.start=10000 -Dconductr-agent.run.allocated-ports.end=10999 --core-node 10.0.0.246:9004 --core-system-name stop-all-bundles-1\"\n}\nframework_id {\n  value: \"stop-all-bundles-1\"\n}\nname: \"conductr-agent\"\nsource: \"conductr\"\n\n------------------------------------------------------------\nTask\'s ExecutorInfo:\nexecutor_id {\n  value: \"conductr-node-10.0.0.249-executor\"\n}\nresources {\n  name: \"cpus\"\n  type: SCALAR\n  scalar {\n    value: 0.9\n  }\n  role: \"*\"\n}\nresources {\n  name: \"mem\"\n  type: SCALAR\n  scalar {\n    value: 402.653184\n  }\n  role: \"*\"\n}\nresources {\n  name: \"disk\"\n  type: SCALAR\n  scalar {\n    value: 1000\n  }\n  role: \"*\"\n}\nresources {\n  name: \"ports\"\n  type: RANGES\n  ranges {\n    range {\n      begin: 2552\n      end: 2552\n    }\n    range {\n      begin: 10000\n      end: 10999\n    }\n  }\n  role: \"*\"\n}\ncommand {\n  uris {\n    value: \"https://downloads.mesosphere.com/java/jre-8u92-linux-x64.tar.gz\"\n    executable: false\n    extract: true\n    cache: false\n  }\n  uris {\n    value: \"http://10.0.7.185/ConductR/markusjura/conductr-agent-0.1.0.tgz\"\n    executable: false\n    extract: true\n    cache: false\n  }\n  value: \"GLOBIGNORE=\\\'*.tar.gz:*.tgz\\\' && export JAVA_HOME=$(echo $(pwd)/jre*) && ./conductr-agent-*/bin/conductr-agent -Dconfig.resource=mesos.conf -Dakka.loglevel=DEBUG -Dakka.remote.netty.tcp.port=2552 -Dconductr-agent.run.allocated-ports.start=10000 -Dconductr-agent.run.allocated-ports.end=10999 --core-node 10.0.0.248:9004 --core-system-name stop-all-bundles-1\"\n}\nframework_id {\n  value: \"stop-all-bundles-1\"\n}\nname: \"conductr-agent\"\nsource: \"conductr\"\n\n------------------------------------------------------------\n"
> slave_id {
>   value: "1154b639-c536-41d1-b9df-a57b24792acb-S4"
> }
> timestamp: 1.474889688506464E9
> source: SOURCE_MASTER
> reason: REASON_TASK_INVALID
> 2016-09-26T11:34:48Z ip-10-0-0-248.us-west-2.compute.internal ERROR MesosSchedulerClient [sourceThread=stop-all-bundles-1-akka.actor.default-dispatcher-22, akkaTimestamp=11:34:48.714UTC, akkaSource=akka.tcp://stop-all-bundles-1@10.0.0.248:9004/user/reaper/mesos-client-supervisor/singleton/mesos-client, sourceActorSystem=stop-all-bundles-1] - Unexpected Mesos task state TASK_ERROR received by the scheduler: task_id {
>   value: "40034b01-e853-4ada-882f-9aaab67f77c2"
> }
> {code}
> Mesos should only validate the executor id. If the new id of the {{ExecutorInfo}} object equals the old one then it should allow the reconnection to the running executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)