You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Alejandro Fernandez (JIRA)" <ji...@apache.org> on 2017/03/28 02:08:41 UTC

[jira] [Updated] (AMBARI-20593) RU Auto-retry does not start when Restarting NN Batch 2 step is corrupted [Batch 1 was corrupted and fixed before]

     [ https://issues.apache.org/jira/browse/AMBARI-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alejandro Fernandez updated AMBARI-20593:
-----------------------------------------
    Description: 
STR:
1) Install ambari 2.5.0.1
In the ambari.properties file, set
stack.upgrade.auto.retry.timeout.mins=6
stack.upgrade.auto.retry.check.interval.secs=30

2) Install HDP with any set of services
3) Add NameNode HA
4) Register and install new HDP stack version
5) Start RU
5) Corrupt one step from Core Masters group (e.g., stop ambari-agent on a node while the command is running)
Ambari will restart Restarting NN Batch 1 
6) Fix corrupted step (e.g., start ambari-agent again)
7) Corrupt another step from before the command is scheduled (e.g., stop ambari-agent on a node)
8) Fix corrupted step (e.g., start ambari-agent agent)

The expectation is that Ambari Server should schedule the command on the 2nd node. However, because the command never got an original_start_time and start_time, the RetryUpgradeActionService was not able to retry it since it didn't have any timestamps to compare against.

  was:
STR:
1)Deploy cluster
2)Register and install new stack version 
3)Add properties for auto retries in ambari.properties file
stack.upgrade.auto.retry.timeout.mins=6
stack.upgrade.auto.retry.check.interval.secs=30
4)Start RU
5)Corrupt one step from CORE for rolling upgrade (stop ambari-agent on a node) [Restarting NN Batch 1 ]

6)Fix corrupted step
7) Corrupt another step from CORE for rolling upgrade (stop ambari-agent on another node) [Restarting NN Batch 2]

Actual result: RU: Paused upgrade (step was failed) but auto retries did not happen


> RU Auto-retry does not start when Restarting NN Batch 2 step is corrupted [Batch 1 was corrupted and fixed before]
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-20593
>                 URL: https://issues.apache.org/jira/browse/AMBARI-20593
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.5.0
>         Environment: rolling upgrade
>            Reporter: Sviatoslav Tereshchenko
>              Labels: rolling_upgrade
>             Fix For: 2.5.1
>
>
> STR:
> 1) Install ambari 2.5.0.1
> In the ambari.properties file, set
> stack.upgrade.auto.retry.timeout.mins=6
> stack.upgrade.auto.retry.check.interval.secs=30
> 2) Install HDP with any set of services
> 3) Add NameNode HA
> 4) Register and install new HDP stack version
> 5) Start RU
> 5) Corrupt one step from Core Masters group (e.g., stop ambari-agent on a node while the command is running)
> Ambari will restart Restarting NN Batch 1 
> 6) Fix corrupted step (e.g., start ambari-agent again)
> 7) Corrupt another step from before the command is scheduled (e.g., stop ambari-agent on a node)
> 8) Fix corrupted step (e.g., start ambari-agent agent)
> The expectation is that Ambari Server should schedule the command on the 2nd node. However, because the command never got an original_start_time and start_time, the RetryUpgradeActionService was not able to retry it since it didn't have any timestamps to compare against.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)