You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Bernd Mathiske (JIRA)" <ji...@apache.org> on 2014/11/11 10:32:34 UTC

[jira] [Created] (MESOS-2068) Add comments that explain framework, executor ID, and task life cycle in slave

Bernd Mathiske created MESOS-2068:
-------------------------------------

             Summary: Add comments that explain framework, executor ID, and task life cycle in slave
                 Key: MESOS-2068
                 URL: https://issues.apache.org/jira/browse/MESOS-2068
             Project: Mesos
          Issue Type: Improvement
          Components: slave
            Reporter: Bernd Mathiske
            Assignee: Bernd Mathiske
            Priority: Minor


Fixing MESOS-947 was relatively difficult because the source code is mostly the only source of information with regard to the life cycle of frameworks, executors, and tasks in the slave. In particular this leads to confusion about whether there could be a task lost state  at the beginning of _runTask() when the framework is NULL. This shall be explained to the best of the assignees knowledge.

For context see https://reviews.apache.org/r/27567
with these comments:

On Nov. 5, 2014, 7:50 p.m., Ben Mahler wrote:
src/slave/slave.cpp, lines 1195-1200
<https://reviews.apache.org/r/27567/diff/1/?file=748326#file748326line1195>

   A comment here as to why we don't need to send TASK_LOST would be much appreciated! It's not obvious so someone might come along and add a TASK_LOST to make sure we're not dropping the task on the floor, so context here would be great!

Bernd Mathiske wrote:
   Hah, thanks for sharing - I am not alone! :-) None of this was obvious to me either, because there is no comment explaining the general life cycle of anything. Once you understand the intended life cycle, there is now way there can be a TASK_LOST situation here, though. Therefore I propose adding comments describing the overall picture regarding frameworks, executor IDs and task creation in the appropriate places, instead. I'll file a ticket if you agree.

Once you understand the intended life cycle, there is now way there can be a TASK_LOST situation here, though.

Phew! :)

Could you distill your learnings into a comment here, and maybe make the log message more informative? Even with an overall description as you mentioned, dummies like me would still get confused here given the lack of _local_ context. ;)

- Ben




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)