You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2018/06/13 17:00:00 UTC

[jira] [Commented] (AMBARI-24090) Resuming a Paused Upgrade Attempts To Retry Tasks Which Were Already Passed

    [ https://issues.apache.org/jira/browse/AMBARI-24090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511428#comment-16511428 ] 

Hudson commented on AMBARI-24090:
---------------------------------

SUCCESS: Integrated in Jenkins build Ambari-trunk-Commit #9448 (See [https://builds.apache.org/job/Ambari-trunk-Commit/9448/])
[AMBARI-24090] - Resuming a Paused Upgrade Attempts To Retry Tasks Which (github: [https://gitbox.apache.org/repos/asf?p=ambari.git&a=commit&h=58fdcab8d348b7fa6ad7624bafc9da89429350ce])
* (edit) ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeResourceProvider.java
* (edit) ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeResourceProviderTest.java


> Resuming a Paused Upgrade Attempts To Retry Tasks Which Were Already Passed
> ---------------------------------------------------------------------------
>
>                 Key: AMBARI-24090
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24090
>             Project: Ambari
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 2.7.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> STR:
> - Perform a stack upgrade where during the upgrade, there's a slave failure due to a timed out task
> - Ignore and Proceed past this task and get to the Component Version Check. It should fail (since you had a timed out task on a slave)
> - Pause the upgrade and then fix the version by restarting the component
> - Resume the upgrade
> The upgrade attempts to place the timed out task back into a PENDING state. 
> I think I have an idea about what's going on here. Consider the following upgrade tasks:
> - 1: COMPLETED
> - 2: HOLDING_TIMEDOUT
> - 3: PENDIND
> - 4: PENDING
> - 5: PENDING
> When you "Ignore and Proceed", it sets the {{HOLDING_TIMEDOUT}} to {{TIMEDOUT}} which is technically a completed state:
> - 1: COMPLETED
> - 2: TIMEDOUT
> - 3: COMPLETED
> - 4: HOLDING_FAILED
> - 5: PENDING
> Now, you go to pause the upgrade and we set every "scheduled" state to {{ABORTED}}, so this preserves existing states:
> - 1: COMPLETED
> - 2: TIMEDOUT
> - 3: COMPLETED
> - 4: ABORTED
> - 5: ABORTED
> When you go to resume the upgrade, it searches for all {{ABORTED}} _AND_ {{TIMEDOUT}} to reset to {{PENDING}}
> - 1: COMPLETED
> - 2: PENDING
> - 3: COMPLETED
> - 4: PENDING
> - 5: PENDING
> So, because an earlier task is now set to {{PENDING}}, this causes the scheduler to barf.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)