You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Nate Cole (JIRA)" <ji...@apache.org> on 2017/04/11 18:36:41 UTC

[jira] [Created] (AMBARI-20736) Allow Potentially Long Running Restart Commands To Have Their Own Timeout

Nate Cole created AMBARI-20736:
----------------------------------

             Summary: Allow Potentially Long Running Restart Commands To Have Their Own Timeout
                 Key: AMBARI-20736
                 URL: https://issues.apache.org/jira/browse/AMBARI-20736
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
            Reporter: Nate Cole
            Assignee: Nate Cole
            Priority: Critical
             Fix For: 2.5.1


During an upgrade of a cluster, some commands are expected to take a very long time depending on what the size of the cluster is and how much data is stored. For example, a NameNode restart with SafeMode exit may take in excess of 30 minutes. On some clusters, this could take less than 1 minute.

Currently today, the only way to adjust these properties is to do so across the board for all commands by editing {{ambari.properties}} and setting {{agent.task.timeout}}. This solution doesn't work very well since the majority of restarts during an upgrade are not on a master component.

There needs to be a way to instruct Ambari that a restart should be allowed to run for a relatively long period of time. 

- Both Java and Python need to be considered here. We don't want Python to give up and return a {{FAILED}} state and we don't want Ambari server to set the task to {{TIMEDOUT}}.

- This can be useful in both normal restarts and upgrade scenarios. 

h3. Upgrade Only
If considering this functionality in the context of an upgrade only, then it is conceivable that this logic can be placed inside of the upgrade XML packs:
{code}
    <service name="HDFS">
      <component name="NAMENODE">
        <upgrade>
          <task xsi:type="restart-task"  timeout="1800"/>
        </upgrade>
{code}

- This would allow future mpacks to be able to control the restart of components. Perhaps this can even be slightly abstracted out:

{code}
    <service name="HDFS">
      <component name="NAMENODE">
        <upgrade>
          <task xsi:type="restart-task"  timeout="upgrade.parameter.master.restart.long"/>
        </upgrade>

upgrade.parameter.slave.restart.short = 300
upgrade.parameter.slave.restart.long = 900
upgrade.parameter.master.restart.short = 1500
upgrade.parameter.master.restart.long = 1800
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)