You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2018/01/08 21:36:03 UTC

[jira] [Updated] (MESOS-8405) Update master task loss handling.

     [ https://issues.apache.org/jira/browse/MESOS-8405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone updated MESOS-8405:
------------------------------
    Description: 
From [~vinodkone] in [r/64940|https://reviews.apache.org/r/64940/]:

{quote}
Ideally, we want terminal but unacknowledged tasks to still be marked unreachable in some way, either via task state being TASK_UNREACHABLE or task being present in unreachableTasks. This allows, for example, the WebUI to not show sandbox links for unreachable tasks irrespective of whether they were terminal or not before going unreachable. 

But doing this is tricky for various reasons:

--> updateTask() doesn't allow a terminal state to be transitioned to TASK_UNREACHABLE. Right now when we call updateTask for a terminal task, it adds TASK_UNREACHABLE status to Task.statuses and also sends it to operator API stream subscribers which looks incorrect. The fact that updateTask internally deals with already terminal tasks is a bad design decision in retrospect. I think the callers shouldn't call it for terminal tasks instead.

--> It's not clear to our users what a completed task means. The intention was for this to hold a cache of terminal and acknowledged tasks for storing recent history. The users of the WebUI probably equate "Completed Tasks" to terminal tasks irrespective of their acknowledgement status, which is why it is confusing for them to see terminal but unacknowledged tasks in the "Active tasks" section in the WebUI.

--> When a framework reconciles the state of a task on an unreachable agent, master replies with TASK_UNREACHABLE irrespective of whether the task was in a non-terminal state or terminal but un-acknowledged state or terminal and acknowledged state when the agent went unreachable.  

I think the direction we want to go towards is

--> Completed tasks should consist of terminal unacknowledged and terminal acknowled tasks, likely in two different data structures.
--> Unreachable tasks should consist of all non-complete tasks on an unreachable agent.  All the tasks in this map should be in TASK_UNREACHABLE state.
{quote}

  was:
From [~agentvindo.dev] in [r/64940|https://reviews.apache.org/r/64940/]:

{quote}
Ideally, we want terminal but unacknowledged tasks to still be marked unreachable in some way, either via task state being TASK_UNREACHABLE or task being present in unreachableTasks. This allows, for example, the WebUI to not show sandbox links for unreachable tasks irrespective of whether they were terminal or not before going unreachable. 

But doing this is tricky for various reasons:

--> updateTask() doesn't allow a terminal state to be transitioned to TASK_UNREACHABLE. Right now when we call updateTask for a terminal task, it adds TASK_UNREACHABLE status to Task.statuses and also sends it to operator API stream subscribers which looks incorrect. The fact that updateTask internally deals with already terminal tasks is a bad design decision in retrospect. I think the callers shouldn't call it for terminal tasks instead.

--> It's not clear to our users what a completed task means. The intention was for this to hold a cache of terminal and acknowledged tasks for storing recent history. The users of the WebUI probably equate "Completed Tasks" to terminal tasks irrespective of their acknowledgement status, which is why it is confusing for them to see terminal but unacknowledged tasks in the "Active tasks" section in the WebUI.

--> When a framework reconciles the state of a task on an unreachable agent, master replies with TASK_UNREACHABLE irrespective of whether the task was in a non-terminal state or terminal but un-acknowledged state or terminal and acknowledged state when the agent went unreachable.  

I think the direction we want to go towards is

--> Completed tasks should consist of terminal unacknowledged and terminal acknowled tasks, likely in two different data structures.
--> Unreachable tasks should consist of all non-complete tasks on an unreachable agent.  All the tasks in this map should be in TASK_UNREACHABLE state.
{quote}


> Update master task loss handling.
> ---------------------------------
>
>                 Key: MESOS-8405
>                 URL: https://issues.apache.org/jira/browse/MESOS-8405
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: James Peach
>
> From [~vinodkone] in [r/64940|https://reviews.apache.org/r/64940/]:
> {quote}
> Ideally, we want terminal but unacknowledged tasks to still be marked unreachable in some way, either via task state being TASK_UNREACHABLE or task being present in unreachableTasks. This allows, for example, the WebUI to not show sandbox links for unreachable tasks irrespective of whether they were terminal or not before going unreachable. 
> But doing this is tricky for various reasons:
> --> updateTask() doesn't allow a terminal state to be transitioned to TASK_UNREACHABLE. Right now when we call updateTask for a terminal task, it adds TASK_UNREACHABLE status to Task.statuses and also sends it to operator API stream subscribers which looks incorrect. The fact that updateTask internally deals with already terminal tasks is a bad design decision in retrospect. I think the callers shouldn't call it for terminal tasks instead.
> --> It's not clear to our users what a completed task means. The intention was for this to hold a cache of terminal and acknowledged tasks for storing recent history. The users of the WebUI probably equate "Completed Tasks" to terminal tasks irrespective of their acknowledgement status, which is why it is confusing for them to see terminal but unacknowledged tasks in the "Active tasks" section in the WebUI.
> --> When a framework reconciles the state of a task on an unreachable agent, master replies with TASK_UNREACHABLE irrespective of whether the task was in a non-terminal state or terminal but un-acknowledged state or terminal and acknowledged state when the agent went unreachable.  
> I think the direction we want to go towards is
> --> Completed tasks should consist of terminal unacknowledged and terminal acknowled tasks, likely in two different data structures.
> --> Unreachable tasks should consist of all non-complete tasks on an unreachable agent.  All the tasks in this map should be in TASK_UNREACHABLE state.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)