You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Dmitro Lisnichenko <dl...@hortonworks.com> on 2017/03/14 17:09:50 UTC

Review Request 57604: YARN service check failed during HDP 2.4-2.6 rolling upgrade with YARN HA enabled

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57604/
-----------------------------------------------------------

Review request for Ambari, Jonathan Hurley, Nate Cole, and Vinod Kumar Vavilapalli.


Bugs: AMBARI-20447
    https://issues.apache.org/jira/browse/AMBARI-20447


Repository: ambari


Description
-------

The problem with YARN service check failure is that during Rolling upgrade from HDP-2.4 to HDP-2.6 (with YARN HA turned on):
# After "core master restart" step, yarn client uses new (HDP-2.6) config and fails with Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found . Forcing yarn client to use old (HDP-2.4) config until client binary is updated helps here
# After "core slave restart" step, using old YARN client config with old YARN client binary does not help. NM/RM classpath points to HDP-2.6. App job gets scheduled, but then fails with log:

{code}17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.serviceStart(AMRMClientAsyncImpl.java:96)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:559)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:299)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2208)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2232)
... 9 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2206)
... 10 more
17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.async.AMRMClientAsync failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at
{code}
# After yarn client is updated to a new binary, service check works fine.
----

Bottom line, this is a known problem with DistributedShell - it was never fixed to not rely on cluster's configuration. What this means is that client configuration changes like this can break DistributedShell apps over upgrades.
Unfortunately nothing we do now can fix this broken upgrade for DistributedShell - as to ideally fix it, we have to go back in time and provide changes.

We have to do two things
# Disable DistributedShell based service-check when we go from 2.4 > 2.6. The RequestHedgingRMFailoverProxyProvider is added in 2.5, so 2.5 > 2.6 is fine.
# Also fix yarn-site.xml starting 2.6 with the following change to avoid this in the future. The change is from using $HADOOP_CONF_DIR which is inherited from the NodeManager to /etc/hadoop/conf/ which is always tied to the client version.
{code}
<property>
<name>yarn.application.classpath</name>
<value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
</property>
{code}


Diffs
-----

  ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.6.xml c27b634efd 
  ambari-server/src/main/resources/stacks/HDP/2.4/upgrades/upgrade-2.6.xml dc92c2b46f 
  ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml ab6b2398b6 
  ambari-server/src/main/resources/stacks/HDP/2.6/services/YARN/configuration/yarn-site.xml 4b97148278 


Diff: https://reviews.apache.org/r/57604/diff/1/


Testing
-------

checked that upgrade 2.4->2.6 passes well. 

First my thought was that there is not need to skip YARN service check after slave restart (since Yarn 2.6 configuration is expected to be correct). But that is not the case, so I excluded YARN service check on this step.

mvn clean test


Thanks,

Dmitro Lisnichenko


Re: Review Request 57604: YARN service check failed during HDP 2.4-2.6 rolling upgrade with YARN HA enabled

Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.

> On March 15, 2017, 5:21 p.m., Nate Cole wrote:
> > ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml
> > Lines 219-222 (original), 219-222 (patched)
> > <https://reviews.apache.org/r/57604/diff/1/?file=1663928#file1663928line219>
> >
> >     I thought the class was in 2.5 such that this wouldn't be the same issue.  Did your tests show otherwise?

Missing class was added at 2.6


- Dmitro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57604/#review169013
-----------------------------------------------------------


On March 14, 2017, 7:09 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57604/
> -----------------------------------------------------------
> 
> (Updated March 14, 2017, 7:09 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley, Nate Cole, and Vinod Kumar Vavilapalli.
> 
> 
> Bugs: AMBARI-20447
>     https://issues.apache.org/jira/browse/AMBARI-20447
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> The problem with YARN service check failure is that during Rolling upgrade from HDP-2.4 to HDP-2.6 (with YARN HA turned on):
> # After "core master restart" step, yarn client uses new (HDP-2.6) config and fails with Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found . Forcing yarn client to use old (HDP-2.4) config until client binary is updated helps here
> # After "core slave restart" step, using old YARN client config with old YARN client binary does not help. NM/RM classpath points to HDP-2.6. App job gets scheduled, but then fails with log:
> 
> {code}17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
> at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
> at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
> at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
> at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.serviceStart(AMRMClientAsyncImpl.java:96)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:559)
> at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:299)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2208)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2232)
> ... 9 more
> Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2206)
> ... 10 more
> 17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.async.AMRMClientAsync failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
> at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
> at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
> at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
> at
> {code}
> # After yarn client is updated to a new binary, service check works fine.
> ----
> 
> Bottom line, this is a known problem with DistributedShell - it was never fixed to not rely on cluster's configuration. What this means is that client configuration changes like this can break DistributedShell apps over upgrades.
> Unfortunately nothing we do now can fix this broken upgrade for DistributedShell - as to ideally fix it, we have to go back in time and provide changes.
> 
> We have to do two things
> # Disable DistributedShell based service-check when we go from 2.4 > 2.6. The RequestHedgingRMFailoverProxyProvider is added in 2.5, so 2.5 > 2.6 is fine.
> # Also fix yarn-site.xml starting 2.6 with the following change to avoid this in the future. The change is from using $HADOOP_CONF_DIR which is inherited from the NodeManager to /etc/hadoop/conf/ which is always tied to the client version.
> {code}
> <property>
> <name>yarn.application.classpath</name>
> <value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
> </property>
> {code}
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.6.xml c27b634efd 
>   ambari-server/src/main/resources/stacks/HDP/2.4/upgrades/upgrade-2.6.xml dc92c2b46f 
>   ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml ab6b2398b6 
>   ambari-server/src/main/resources/stacks/HDP/2.6/services/YARN/configuration/yarn-site.xml 4b97148278 
> 
> 
> Diff: https://reviews.apache.org/r/57604/diff/1/
> 
> 
> Testing
> -------
> 
> checked that upgrade 2.4->2.6 passes well. 
> 
> First my thought was that there is not need to skip YARN service check after slave restart (since Yarn 2.6 configuration is expected to be correct). But that is not the case, so I excluded YARN service check on this step.
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Re: Review Request 57604: YARN service check failed during HDP 2.4-2.6 rolling upgrade with YARN HA enabled

Posted by Nate Cole <nc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57604/#review169013
-----------------------------------------------------------


Fix it, then Ship it!





ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml
Lines 219-222 (original), 219-222 (patched)
<https://reviews.apache.org/r/57604/#comment241345>

    I thought the class was in 2.5 such that this wouldn't be the same issue.  Did your tests show otherwise?


- Nate Cole


On March 14, 2017, 1:09 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57604/
> -----------------------------------------------------------
> 
> (Updated March 14, 2017, 1:09 p.m.)
> 
> 
> Review request for Ambari, Jonathan Hurley, Nate Cole, and Vinod Kumar Vavilapalli.
> 
> 
> Bugs: AMBARI-20447
>     https://issues.apache.org/jira/browse/AMBARI-20447
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> The problem with YARN service check failure is that during Rolling upgrade from HDP-2.4 to HDP-2.6 (with YARN HA turned on):
> # After "core master restart" step, yarn client uses new (HDP-2.6) config and fails with Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found . Forcing yarn client to use old (HDP-2.4) config until client binary is updated helps here
> # After "core slave restart" step, using old YARN client config with old YARN client binary does not help. NM/RM classpath points to HDP-2.6. App job gets scheduled, but then fails with log:
> 
> {code}17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
> at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
> at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
> at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
> at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.serviceStart(AMRMClientAsyncImpl.java:96)
> at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:559)
> at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:299)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2208)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2232)
> ... 9 more
> Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114)
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2206)
> ... 10 more
> 17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.async.AMRMClientAsync failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
> at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
> at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
> at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
> at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
> at
> {code}
> # After yarn client is updated to a new binary, service check works fine.
> ----
> 
> Bottom line, this is a known problem with DistributedShell - it was never fixed to not rely on cluster's configuration. What this means is that client configuration changes like this can break DistributedShell apps over upgrades.
> Unfortunately nothing we do now can fix this broken upgrade for DistributedShell - as to ideally fix it, we have to go back in time and provide changes.
> 
> We have to do two things
> # Disable DistributedShell based service-check when we go from 2.4 > 2.6. The RequestHedgingRMFailoverProxyProvider is added in 2.5, so 2.5 > 2.6 is fine.
> # Also fix yarn-site.xml starting 2.6 with the following change to avoid this in the future. The change is from using $HADOOP_CONF_DIR which is inherited from the NodeManager to /etc/hadoop/conf/ which is always tied to the client version.
> {code}
> <property>
> <name>yarn.application.classpath</name>
> <value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
> </property>
> {code}
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/stacks/HDP/2.3/upgrades/upgrade-2.6.xml c27b634efd 
>   ambari-server/src/main/resources/stacks/HDP/2.4/upgrades/upgrade-2.6.xml dc92c2b46f 
>   ambari-server/src/main/resources/stacks/HDP/2.5/upgrades/upgrade-2.6.xml ab6b2398b6 
>   ambari-server/src/main/resources/stacks/HDP/2.6/services/YARN/configuration/yarn-site.xml 4b97148278 
> 
> 
> Diff: https://reviews.apache.org/r/57604/diff/1/
> 
> 
> Testing
> -------
> 
> checked that upgrade 2.4->2.6 passes well. 
> 
> First my thought was that there is not need to skip YARN service check after slave restart (since Yarn 2.6 configuration is expected to be correct). But that is not the case, so I excluded YARN service check on this step.
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>