You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Jonathan Hurley (JIRA)" <ji...@apache.org> on 2018/06/13 13:58:00 UTC
[jira] [Created] (AMBARI-24090) Resuming a Paused Upgrade Attempts
To Retry Tasks Which Were Already Passed
Jonathan Hurley created AMBARI-24090:
----------------------------------------
Summary: Resuming a Paused Upgrade Attempts To Retry Tasks Which Were Already Passed
Key: AMBARI-24090
URL: https://issues.apache.org/jira/browse/AMBARI-24090
Project: Ambari
Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jonathan Hurley
Assignee: Jonathan Hurley
Fix For: 2.7.0
STR:
- Perform a stack upgrade where during the upgrade, there's a slave failure due to a timed out task
- Ignore and Proceed past this task and get to the Component Version Check. It should fail (since you had a timed out task on a slave)
- Pause the upgrade and then fix the version by restarting the component
- Resume the upgrade
The upgrade attempts to place the timed out task back into a PENDING state.
I think I have an idea about what's going on here. Consider the following upgrade tasks:
- 1: COMPLETED
- 2: HOLDING_TIMEDOUT
- 3: PENDIND
- 4: PENDING
- 5: PENDING
When you "Ignore and Proceed", it sets the {{HOLDING_TIMEDOUT}} to {{TIMEDOUT}} which is technically a completed state:
- 1: COMPLETED
- 2: TIMEDOUT
- 3: COMPLETED
- 4: HOLDING_FAILED
- 5: PENDING
Now, you go to pause the upgrade and we set every "scheduled" state to {{ABORTED}}, so this preserves existing states:
- 1: COMPLETED
- 2: TIMEDOUT
- 3: COMPLETED
- 4: ABORTED
- 5: ABORTED
When you go to resume the upgrade, it searches for all {{ABORTED}} _AND_ {{TIMEDOUT}} to reset to {{PENDING}}
- 1: COMPLETED
- 2: PENDING
- 3: COMPLETED
- 4: PENDING
- 5: PENDING
So, because an earlier task is now set to {{PENDING}}, this causes the scheduler to barf.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)