You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Sumit Mohanty (JIRA)" <ji...@apache.org> on 2014/01/17 18:40:20 UTC

[jira] [Comment Edited] (AMBARI-4324) Server should rely on command reports when considering tasks timed out

    [ https://issues.apache.org/jira/browse/AMBARI-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874909#comment-13874909 ] 

Sumit Mohanty edited comment on AMBARI-4324 at 1/17/14 5:38 PM:
----------------------------------------------------------------

[~dmitriusan], this is a good feature.

We will also need to support cancel for request or request/task through API. I am working on a proposal for that. Can you ensure there are hooks to add Cancel commands when it is initiated through a API call.

Typically, agents do not have a long queue of pending tasks. Just wondering if Controller.py should process it cancel synchronously. Perhaps it is OK. 

We should have the ability to interrupt INPROGRESS tasks as well. I am thinking run-away tasks or misconfigured timeouts. What would it take to have this ability?

What is the other JIRA ID? The current link is to this JIRA.
{quote}
This implementation should also solve another related jira AMBARI-4324
{quote}


was (Author: sumitmohanty):
[~ dmitriusan], this is a good feature.

We will also need to support cancel for request or request/task through API. I am working on a proposal for that. Can you ensure there are hooks to add Cancel commands when it is initiated through a API call.

Typically, agents do not have a long queue of pending tasks. Just wondering if Controller.py should process it cancel synchronously. Perhaps it is OK. 

We should have the ability to interrupt INPROGRESS tasks as well. I am thinking run-away tasks or misconfigured timeouts. What would it take to have this ability?

What is the other JIRA ID? The current link is to this JIRA.
{quote}
This implementation should also solve another related jira AMBARI-4324
{quote}

> Server should rely on command reports when considering tasks timed out
> ----------------------------------------------------------------------
>
>                 Key: AMBARI-4324
>                 URL: https://issues.apache.org/jira/browse/AMBARI-4324
>             Project: Ambari
>          Issue Type: Improvement
>          Components: agent, controller
>    Affects Versions: 1.5.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 1.5.0
>
>
> As of now, task timeout at server and timeout at agent are two different mechanisms, that currently work independently and duplicate each other. 
> Such behaviour leads to strange scenario:
> - cluster installation is started
> - execution of some command exceeds timeout
> - server considers this command and *all next* commands in request timed out. This state is shown at UI as well.
> - at the same time, agent considers currently executed command timed out an kills it. After that, agent starts executing the next command in queue. If next commands does not fail, agent sends COMPLETE status reports.
> - server receives  COMPLETE status reports and updates component status.
> - if user clicks "Retry installation", only tasks for not installed components are created.
> - as a result, UI shows less tasks than user expects
> Changes in scope of this jira:
> add TIMEDOUT command status report type at agent. At the server side, HostRoleStatus enum already has this status type. Modify server behaviour: server considers a task timed out when it receives appropriate command report from the agent. In this case, all task time tracking logic is consolidated at agent. Doing that will simplify timeout handling for CustomCommands and CustomActions.
> Some issues may occur when agent host goes down and therefore does not send any command reports. Server should have some handling for such case .



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)