You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Dmitry Lysnichenko (JIRA)" <ji...@apache.org> on 2014/01/16 20:03:21 UTC

[jira] [Created] (AMBARI-4324) Server should rely on command reports when considering tasks timed out

Dmitry Lysnichenko created AMBARI-4324:
------------------------------------------

             Summary: Server should rely on command reports when considering tasks timed out
                 Key: AMBARI-4324
                 URL: https://issues.apache.org/jira/browse/AMBARI-4324
             Project: Ambari
          Issue Type: Improvement
          Components: agent, controller
    Affects Versions: 1.5.0
            Reporter: Dmitry Lysnichenko
            Assignee: Dmitry Lysnichenko
             Fix For: 1.5.0


As of now, task timeout at server and timeout at agent are two different mechanisms, that currently work independently and duplicate each other. 

Such behaviour leads to strange scenario:
- cluster installation is started
- execution of some command exceeds timeout
- server considers this command and *all next* commands in request timed out. This state is shown at UI as well.
- at the same time, agent considers currently executed command timed out an kills it. After that, agent starts executing the next command in queue. If next commands does not fail, agent sends COMPLETE status reports.
- server receives  COMPLETE status reports and updates component status.
- if user clicks "Retry installation", only tasks for not installed components are created.
- as a result, UI shows less tasks than user expects

Changes in scope of this jira:
add TIMEDOUT command status report type at agent. At the server side, HostRoleStatus enum already has this status type. Modify server behaviour: server considers a task timed out when it receives appropriate command report from the agent. In this case, all task time tracking logic is consolidated at agent. Doing that will simplify timeout handling for CustomCommands and CustomActions.

Some issues may occur when agent host goes down and therefore does not send any command reports. Server should have some handling for such case .





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)