You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Jonathan Hurley (JIRA)" <ji...@apache.org> on 2018/06/13 13:58:00 UTC

[jira] [Created] (AMBARI-24090) Resuming a Paused Upgrade Attempts To Retry Tasks Which Were Already Passed

Jonathan Hurley created AMBARI-24090:
----------------------------------------

             Summary: Resuming a Paused Upgrade Attempts To Retry Tasks Which Were Already Passed
                 Key: AMBARI-24090
                 URL: https://issues.apache.org/jira/browse/AMBARI-24090
             Project: Ambari
          Issue Type: Bug
    Affects Versions: 2.6.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
             Fix For: 2.7.0


STR:
- Perform a stack upgrade where during the upgrade, there's a slave failure due to a timed out task
- Ignore and Proceed past this task and get to the Component Version Check. It should fail (since you had a timed out task on a slave)
- Pause the upgrade and then fix the version by restarting the component
- Resume the upgrade

The upgrade attempts to place the timed out task back into a PENDING state. 

I think I have an idea about what's going on here. Consider the following upgrade tasks:

- 1: COMPLETED
- 2: HOLDING_TIMEDOUT
- 3: PENDIND
- 4: PENDING
- 5: PENDING

When you "Ignore and Proceed", it sets the {{HOLDING_TIMEDOUT}} to {{TIMEDOUT}} which is technically a completed state:

- 1: COMPLETED
- 2: TIMEDOUT
- 3: COMPLETED
- 4: HOLDING_FAILED
- 5: PENDING

Now, you go to pause the upgrade and we set every "scheduled" state to {{ABORTED}}, so this preserves existing states:

- 1: COMPLETED
- 2: TIMEDOUT
- 3: COMPLETED
- 4: ABORTED
- 5: ABORTED

When you go to resume the upgrade, it searches for all {{ABORTED}} _AND_ {{TIMEDOUT}} to reset to {{PENDING}}

- 1: COMPLETED
- 2: PENDING
- 3: COMPLETED
- 4: PENDING
- 5: PENDING

So, because an earlier task is now set to {{PENDING}}, this causes the scheduler to barf.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)