You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Dmitro Lisnichenko <dl...@hortonworks.com> on 2015/11/10 18:30:32 UTC

Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/
-----------------------------------------------------------

(Updated Nov. 10, 2015, 7:30 p.m.)


Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.


Summary (updated)
-----------------

SKIPPED_FAILED state should not be bubbled up to the Upgrade level


Bugs: AMBARI-13818
    https://issues.apache.org/jira/browse/AMBARI-13818


Repository: ambari


Description (updated)
-------

When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.


Also fixes another blocker:
STR:
1) Install and deploy cluster with older HDP version
2) Enable NameNode HA
3) Register, install new HDP version
4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
5) Break datanode_upgrade.py script and wait for Core Slaves failures
6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
Result:
Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
{
  "status" : 400,
  "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
}


Diffs
-----

  ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 

Diff: https://reviews.apache.org/r/40139/diff/


Testing (updated)
-------

checked on live cluster

mvn clean test in progress


Thanks,

Dmitro Lisnichenko


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Alejandro Fernandez <af...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review105915
-----------------------------------------------------------



ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java (line 360)
<https://reviews.apache.org/r/40139/#comment164607>

    I would prefer to combine this logic with the method above so that it's more maintainable. Can we set skippable to a an enum that accepts 3 values, IGNORE, TRUE, FALSE?


- Alejandro Fernandez


On Nov. 10, 2015, 5:55 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2015, 5:55 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review106138
-----------------------------------------------------------

Ship it!


Ship It!

- Jonathan Hurley


On Nov. 11, 2015, 2:30 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 11, 2015, 2:30 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> After discussing this on the call today, it was determined that the web client has enough information to iterate over the data being returned to display the information correctly. However, it's rather undesirable that they are retrieving all UpgradeGroup and UpgradeItem structures on every request. So, they are going to change this so they only retrieve the UpgradeGroup.
> The best compromise here is to surface a new property that describes the display status of upgrade group
> The new display_status field is used as information for the UI to use for display purposes only. The status field will revert to its prior behavior. In this example UpgradeGroup is COMPLETED but the UI has a hint to show SKIPPED_FAILED.
> We should do this for the following states:
> SKIPPED_FAILED
> FAILED
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeGroupResourceProvider.java c4dcd27 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/
-----------------------------------------------------------

(Updated Nov. 11, 2015, 9:30 p.m.)


Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.


Changes
-------

Uploaded a new patch


Bugs: AMBARI-13818
    https://issues.apache.org/jira/browse/AMBARI-13818


Repository: ambari


Description (updated)
-------

After discussing this on the call today, it was determined that the web client has enough information to iterate over the data being returned to display the information correctly. However, it's rather undesirable that they are retrieving all UpgradeGroup and UpgradeItem structures on every request. So, they are going to change this so they only retrieve the UpgradeGroup.
The best compromise here is to surface a new property that describes the display status of upgrade group
The new display_status field is used as information for the UI to use for display purposes only. The status field will revert to its prior behavior. In this example UpgradeGroup is COMPLETED but the UI has a hint to show SKIPPED_FAILED.
We should do this for the following states:
SKIPPED_FAILED
FAILED


Also fixes another blocker:
STR:
1) Install and deploy cluster with older HDP version
2) Enable NameNode HA
3) Register, install new HDP version
4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
5) Break datanode_upgrade.py script and wait for Core Slaves failures
6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
Result:
Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
{
  "status" : 400,
  "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
}


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
  ambari-server/src/main/java/org/apache/ambari/server/controller/internal/UpgradeGroupResourceProvider.java c4dcd27 
  ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 

Diff: https://reviews.apache.org/r/40139/diff/


Testing
-------

checked on live cluster

mvn clean test in progress


Thanks,

Dmitro Lisnichenko


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Jonathan Hurley <jh...@hortonworks.com>.

> On Nov. 10, 2015, 2:31 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java, lines 352-361
> > <https://reviews.apache.org/r/40139/diff/2/?file=1122084#file1122084line352>
> >
> >     So my main issue here is that for someone not completely familiar with this issue, they could easily use the other method which would produce a status of SKIPPED_FAILED.
> >     
> >     It's just too easy to pick the wrong method here when writing dependent code.
> >     
> >     With that said, why are we even bubbling this up? This should behave as any other failure:
> >     
> >     counters.get(HostRoleStatus.FAILED) > 0 && !skippable ? HostRoleStatus.FAILED :
> >     
> >     - If the command fails and it wasn't skippable, then the stage and request are FAILED
> >     - If the command fails and it was skippable, then the stage and request are COMPLETED but the command itself still failed
> >     
> >     Just because it was "auto skipped" doesn't mean it should display differently.
> 
> Jonathan Hurley wrote:
>     To clarify; auto skipped should leave the command as SKIPPED_FAILED, but the stage and request should not be - just like how it is for a failure which was skipped by hand.
> 
> Nate Cole wrote:
>     The main thrust of this is that the UI guys want the Stage (Upgrade Item and Upgrade Group) to show SKIPPED_FAILED with the icon they have (that "reply" arrow swoop), but the request (Upgrade) should not.  They want that without having to iterate to the Task json object to find out that there are SKIPPED_FAILED tasks.

I think they should have to iterate over it - logic-wise, there's no difference between this and a normal skipped failure. The state calculations should remain the same. If the UI needs to display different information, then they should do the iteration work.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review105932
-----------------------------------------------------------


On Nov. 10, 2015, 12:55 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2015, 12:55 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Nate Cole <nc...@hortonworks.com>.

> On Nov. 10, 2015, 2:31 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java, lines 352-361
> > <https://reviews.apache.org/r/40139/diff/2/?file=1122084#file1122084line352>
> >
> >     So my main issue here is that for someone not completely familiar with this issue, they could easily use the other method which would produce a status of SKIPPED_FAILED.
> >     
> >     It's just too easy to pick the wrong method here when writing dependent code.
> >     
> >     With that said, why are we even bubbling this up? This should behave as any other failure:
> >     
> >     counters.get(HostRoleStatus.FAILED) > 0 && !skippable ? HostRoleStatus.FAILED :
> >     
> >     - If the command fails and it wasn't skippable, then the stage and request are FAILED
> >     - If the command fails and it was skippable, then the stage and request are COMPLETED but the command itself still failed
> >     
> >     Just because it was "auto skipped" doesn't mean it should display differently.
> 
> Jonathan Hurley wrote:
>     To clarify; auto skipped should leave the command as SKIPPED_FAILED, but the stage and request should not be - just like how it is for a failure which was skipped by hand.

The main thrust of this is that the UI guys want the Stage (Upgrade Item and Upgrade Group) to show SKIPPED_FAILED with the icon they have (that "reply" arrow swoop), but the request (Upgrade) should not.  They want that without having to iterate to the Task json object to find out that there are SKIPPED_FAILED tasks.


- Nate


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review105932
-----------------------------------------------------------


On Nov. 10, 2015, 12:55 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2015, 12:55 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

> On Nov. 10, 2015, 9:31 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java, lines 352-361
> > <https://reviews.apache.org/r/40139/diff/2/?file=1122084#file1122084line352>
> >
> >     So my main issue here is that for someone not completely familiar with this issue, they could easily use the other method which would produce a status of SKIPPED_FAILED.
> >     
> >     It's just too easy to pick the wrong method here when writing dependent code.
> >     
> >     With that said, why are we even bubbling this up? This should behave as any other failure:
> >     
> >     counters.get(HostRoleStatus.FAILED) > 0 && !skippable ? HostRoleStatus.FAILED :
> >     
> >     - If the command fails and it wasn't skippable, then the stage and request are FAILED
> >     - If the command fails and it was skippable, then the stage and request are COMPLETED but the command itself still failed
> >     
> >     Just because it was "auto skipped" doesn't mean it should display differently.
> 
> Jonathan Hurley wrote:
>     To clarify; auto skipped should leave the command as SKIPPED_FAILED, but the stage and request should not be - just like how it is for a failure which was skipped by hand.
> 
> Nate Cole wrote:
>     The main thrust of this is that the UI guys want the Stage (Upgrade Item and Upgrade Group) to show SKIPPED_FAILED with the icon they have (that "reply" arrow swoop), but the request (Upgrade) should not.  They want that without having to iterate to the Task json object to find out that there are SKIPPED_FAILED tasks.
> 
> Jonathan Hurley wrote:
>     I think they should have to iterate over it - logic-wise, there's no difference between this and a normal skipped failure. The state calculations should remain the same. If the UI needs to display different information, then they should do the iteration work.

Jonathan, my modifications affect only summary request status. Stage state calculations remain the same. Do you mean that UI should handle SKIPPED_FAILED itself on request level, and patch is not required at all?


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review105932
-----------------------------------------------------------


On Nov. 10, 2015, 7:55 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2015, 7:55 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

> On Nov. 10, 2015, 9:31 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java, lines 352-361
> > <https://reviews.apache.org/r/40139/diff/2/?file=1122084#file1122084line352>
> >
> >     So my main issue here is that for someone not completely familiar with this issue, they could easily use the other method which would produce a status of SKIPPED_FAILED.
> >     
> >     It's just too easy to pick the wrong method here when writing dependent code.
> >     
> >     With that said, why are we even bubbling this up? This should behave as any other failure:
> >     
> >     counters.get(HostRoleStatus.FAILED) > 0 && !skippable ? HostRoleStatus.FAILED :
> >     
> >     - If the command fails and it wasn't skippable, then the stage and request are FAILED
> >     - If the command fails and it was skippable, then the stage and request are COMPLETED but the command itself still failed
> >     
> >     Just because it was "auto skipped" doesn't mean it should display differently.
> 
> Jonathan Hurley wrote:
>     To clarify; auto skipped should leave the command as SKIPPED_FAILED, but the stage and request should not be - just like how it is for a failure which was skipped by hand.
> 
> Nate Cole wrote:
>     The main thrust of this is that the UI guys want the Stage (Upgrade Item and Upgrade Group) to show SKIPPED_FAILED with the icon they have (that "reply" arrow swoop), but the request (Upgrade) should not.  They want that without having to iterate to the Task json object to find out that there are SKIPPED_FAILED tasks.
> 
> Jonathan Hurley wrote:
>     I think they should have to iterate over it - logic-wise, there's no difference between this and a normal skipped failure. The state calculations should remain the same. If the UI needs to display different information, then they should do the iteration work.
> 
> Dmitro Lisnichenko wrote:
>     Jonathan, my modifications affect only summary request status. Stage state calculations remain the same. Do you mean that UI should handle SKIPPED_FAILED itself on request level, and patch is not required at all?

Read discussion in jira and got it


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review105932
-----------------------------------------------------------


On Nov. 10, 2015, 7:55 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2015, 7:55 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Jonathan Hurley <jh...@hortonworks.com>.

> On Nov. 10, 2015, 2:31 p.m., Jonathan Hurley wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java, lines 352-361
> > <https://reviews.apache.org/r/40139/diff/2/?file=1122084#file1122084line352>
> >
> >     So my main issue here is that for someone not completely familiar with this issue, they could easily use the other method which would produce a status of SKIPPED_FAILED.
> >     
> >     It's just too easy to pick the wrong method here when writing dependent code.
> >     
> >     With that said, why are we even bubbling this up? This should behave as any other failure:
> >     
> >     counters.get(HostRoleStatus.FAILED) > 0 && !skippable ? HostRoleStatus.FAILED :
> >     
> >     - If the command fails and it wasn't skippable, then the stage and request are FAILED
> >     - If the command fails and it was skippable, then the stage and request are COMPLETED but the command itself still failed
> >     
> >     Just because it was "auto skipped" doesn't mean it should display differently.

To clarify; auto skipped should leave the command as SKIPPED_FAILED, but the stage and request should not be - just like how it is for a failure which was skipped by hand.


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review105932
-----------------------------------------------------------


On Nov. 10, 2015, 12:55 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2015, 12:55 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review105932
-----------------------------------------------------------



ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java (lines 352 - 361)
<https://reviews.apache.org/r/40139/#comment164631>

    So my main issue here is that for someone not completely familiar with this issue, they could easily use the other method which would produce a status of SKIPPED_FAILED.
    
    It's just too easy to pick the wrong method here when writing dependent code.
    
    With that said, why are we even bubbling this up? This should behave as any other failure:
    
    counters.get(HostRoleStatus.FAILED) > 0 && !skippable ? HostRoleStatus.FAILED :
    
    - If the command fails and it wasn't skippable, then the stage and request are FAILED
    - If the command fails and it was skippable, then the stage and request are COMPLETED but the command itself still failed
    
    Just because it was "auto skipped" doesn't mean it should display differently.


- Jonathan Hurley


On Nov. 10, 2015, 12:55 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2015, 12:55 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
>   ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/
-----------------------------------------------------------

(Updated Nov. 10, 2015, 7:55 p.m.)


Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.


Changes
-------

Sorry, forgot to upload an updated patch


Bugs: AMBARI-13818
    https://issues.apache.org/jira/browse/AMBARI-13818


Repository: ambari


Description
-------

When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.


Also fixes another blocker:
STR:
1) Install and deploy cluster with older HDP version
2) Enable NameNode HA
3) Register, install new HDP version
4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
5) Break datanode_upgrade.py script and wait for Core Slaves failures
6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
Result:
Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
{
  "status" : 400,
  "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
}


Diffs (updated)
-----

  ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
  ambari-server/src/test/java/org/apache/ambari/server/controller/internal/CalculatedStatusTest.java 4b8587f 

Diff: https://reviews.apache.org/r/40139/diff/


Testing
-------

checked on live cluster

mvn clean test in progress


Thanks,

Dmitro Lisnichenko


Re: Review Request 40139: SKIPPED_FAILED state should not be bubbled up to the Upgrade level

Posted by Nate Cole <nc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40139/#review105906
-----------------------------------------------------------


Update unit tests for this.


ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java (line 366)
<https://reviews.apache.org/r/40139/#comment164599>

    These would be exposed via unit tests Somewhere.  I don't know if SKIPPED_FAILED should be considered FAILED.  That makes the whole request look like it failed.  Can't you just ignore SKIPPED_FAILED in this huge ternary?  Maybe something like:
    
    counters.get(HostRoleStatus.SKIPPED_FAILED) > 0 && !ignoreSkippedFailed ? HostRoleStatus.SKIPPED_FAILED // ignoreSkipFailed is a new parameter for the existing method
    
    Stages and UpgradeGroups must still be SKIPPED_FAILED, it's only the *Request* that shouldn't count them.  In the snippet I provided, the fall-through will still make the request IN_PROGRESS, which I think is desirable.


- Nate Cole


On Nov. 10, 2015, 12:30 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40139/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2015, 12:30 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Jonathan Hurley, Jayush Luniya, Nate Cole, and Yusaku Sako.
> 
> 
> Bugs: AMBARI-13818
>     https://issues.apache.org/jira/browse/AMBARI-13818
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When there is a skipped failure, the "upgrade" state itself comes SKIPPED_FAILED. Even when the upgrade is running or paused, it is returning "SKIPPED_FAILED". The API should not roll this up to the "upgrade" level as the current behavior is confusing. At the top level, it should just be HOLDING, IN_PROGRESS, COMPLETED, etc. SKIPPED_FAILED should be bubbled up to the upgrade group level and stop there.
> 
> 
> Also fixes another blocker:
> STR:
> 1) Install and deploy cluster with older HDP version
> 2) Enable NameNode HA
> 3) Register, install new HDP version
> 4) Start Rolling Upgrade with "Skip all Service Check failures" and "Skip all Slave Component failures" options
> 5) Break datanode_upgrade.py script and wait for Core Slaves failures
> 6) Click "Pause upgrade" on "Core Slaves - >Verifying Skipped Failures" step
> Result:
> Button "Resume upgrade" doesn't work. After clicking on this button I've got next http response
> {
>   "status" : 400,
>   "message" : "java.lang.IllegalArgumentException: Can only set status to PENDING when the upgrade is ABORTED (currently SKIPPED_FAILED)"
> }
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java f87c32c 
> 
> Diff: https://reviews.apache.org/r/40139/diff/
> 
> 
> Testing
> -------
> 
> checked on live cluster
> 
> mvn clean test in progress
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>