You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@ambari.apache.org by Jonathan Hurley <jh...@hortonworks.com> on 2017/12/11 17:04:40 UTC
Re: Review Request 64502: YARN Shuffle Service Can't Be Found On
Client-Only Nodes After New Cluster Install
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64502/
-----------------------------------------------------------
(Updated Dec. 11, 2017, 12:04 p.m.)
Review request for Ambari, Dmitro Lisnichenko and Nate Cole.
Changes
-------
I realized that downloading configurations is also problematic since it's done on the Ambari server and not on the real cluster. As such, we should also pass down the component version structure in client config download commands.
Bugs: AMBARI-22628
https://issues.apache.org/jira/browse/AMBARI-22628
Repository: ambari
Description
-------
Installing a new cluster can create values in yarn-site.xml which have {{None}} specified in the classpath for Spark
```
<property>
<name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
<value>/usr/hdp/None/spark2/aux/*</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
<value>/usr/hdp/None/spark/aux/*</value>
</property>
<property>
<name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
<value>/usr/hdp/None/spark/hdpLib/*</value>
</property>
```
The cause for this is that YARN Clients on hosts without daemons never get a restart command after the initial {{yarn-site.xml}}, and can never fill in the correct values. This causes problems when jobs are run on these nodes:
```
2017-12-04 10:16:41,789 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED; cause: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
```
Diffs (updated)
-----
ambari-server/src/main/java/org/apache/ambari/server/agent/ExecutionCommand.java 9d5e29ee8a
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClientConfigResourceProvider.java a7c712bd1a
ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java 15efcd2173
ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java ce328f91ff
ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py 98141456c7
Diff: https://reviews.apache.org/r/64502/diff/3/
Changes: https://reviews.apache.org/r/64502/diff/2-3/
Testing
-------
Manual install via UI and Blueprint
Thanks,
Jonathan Hurley
Re: Review Request 64502: YARN Shuffle Service Can't Be Found On
Client-Only Nodes After New Cluster Install
Posted by Jonathan Hurley <jh...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64502/#review193403
-----------------------------------------------------------
ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
Lines 2812-2841 (patched)
<https://reviews.apache.org/r/64502/#comment271958>
There is change to the logic here. We're only going to send down a component version if it advertises a version AND its been resolved.
There's no point in sending down a version if it's not trusted, as is the case with "latest" installations.
- Jonathan Hurley
On Dec. 11, 2017, 12:04 p.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64502/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2017, 12:04 p.m.)
>
>
> Review request for Ambari, Dmitro Lisnichenko and Nate Cole.
>
>
> Bugs: AMBARI-22628
> https://issues.apache.org/jira/browse/AMBARI-22628
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Installing a new cluster can create values in yarn-site.xml which have {{None}} specified in the classpath for Spark
>
> ```
> <property>
> <name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
> <value>/usr/hdp/None/spark2/aux/*</value>
> </property>
>
> <property>
> <name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
> <value>/usr/hdp/None/spark/aux/*</value>
> </property>
>
> <property>
> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
> <value>/usr/hdp/None/spark/hdpLib/*</value>
> </property>
> ```
>
> The cause for this is that YARN Clients on hosts without daemons never get a restart command after the initial {{yarn-site.xml}}, and can never fill in the correct values. This causes problems when jobs are run on these nodes:
>
> ```
> 2017-12-04 10:16:41,789 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED; cause: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
> java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
> ```
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/agent/ExecutionCommand.java 9d5e29ee8a
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClientConfigResourceProvider.java a7c712bd1a
> ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java 15efcd2173
> ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java ce328f91ff
> ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py 98141456c7
>
>
> Diff: https://reviews.apache.org/r/64502/diff/3/
>
>
> Testing
> -------
>
> Manual install via UI and Blueprint
>
>
> Thanks,
>
> Jonathan Hurley
>
>
Re: Review Request 64502: YARN Shuffle Service Can't Be Found On
Client-Only Nodes After New Cluster Install
Posted by Dmitro Lisnichenko <dl...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64502/#review193410
-----------------------------------------------------------
Ship it!
Ship It!
- Dmitro Lisnichenko
On Dec. 11, 2017, 7:04 p.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64502/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2017, 7:04 p.m.)
>
>
> Review request for Ambari, Dmitro Lisnichenko and Nate Cole.
>
>
> Bugs: AMBARI-22628
> https://issues.apache.org/jira/browse/AMBARI-22628
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Installing a new cluster can create values in yarn-site.xml which have {{None}} specified in the classpath for Spark
>
> ```
> <property>
> <name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
> <value>/usr/hdp/None/spark2/aux/*</value>
> </property>
>
> <property>
> <name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
> <value>/usr/hdp/None/spark/aux/*</value>
> </property>
>
> <property>
> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
> <value>/usr/hdp/None/spark/hdpLib/*</value>
> </property>
> ```
>
> The cause for this is that YARN Clients on hosts without daemons never get a restart command after the initial {{yarn-site.xml}}, and can never fill in the correct values. This causes problems when jobs are run on these nodes:
>
> ```
> 2017-12-04 10:16:41,789 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED; cause: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
> java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
> ```
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/agent/ExecutionCommand.java 9d5e29ee8a
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClientConfigResourceProvider.java a7c712bd1a
> ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java 15efcd2173
> ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java ce328f91ff
> ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py 98141456c7
>
>
> Diff: https://reviews.apache.org/r/64502/diff/3/
>
>
> Testing
> -------
>
> Manual install via UI and Blueprint
>
>
> Thanks,
>
> Jonathan Hurley
>
>
Re: Review Request 64502: YARN Shuffle Service Can't Be Found On
Client-Only Nodes After New Cluster Install
Posted by Nate Cole <nc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64502/#review193416
-----------------------------------------------------------
Ship it!
Ship It!
- Nate Cole
On Dec. 11, 2017, 12:04 p.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64502/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2017, 12:04 p.m.)
>
>
> Review request for Ambari, Dmitro Lisnichenko and Nate Cole.
>
>
> Bugs: AMBARI-22628
> https://issues.apache.org/jira/browse/AMBARI-22628
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Installing a new cluster can create values in yarn-site.xml which have {{None}} specified in the classpath for Spark
>
> ```
> <property>
> <name>yarn.nodemanager.aux-services.spark2_shuffle.classpath</name>
> <value>/usr/hdp/None/spark2/aux/*</value>
> </property>
>
> <property>
> <name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
> <value>/usr/hdp/None/spark/aux/*</value>
> </property>
>
> <property>
> <name>yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath</name>
> <value>/usr/hdp/None/spark/hdpLib/*</value>
> </property>
> ```
>
> The cause for this is that YARN Clients on hosts without daemons never get a restart command after the initial {{yarn-site.xml}}, and can never fill in the correct values. This causes problems when jobs are run on these nodes:
>
> ```
> 2017-12-04 10:16:41,789 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices failed in state INITED; cause: java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
> java.lang.ClassNotFoundException: org.apache.spark.network.yarn.YarnShuffleService
> ```
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/agent/ExecutionCommand.java 9d5e29ee8a
> ambari-server/src/main/java/org/apache/ambari/server/controller/internal/ClientConfigResourceProvider.java a7c712bd1a
> ambari-server/src/main/java/org/apache/ambari/server/state/Cluster.java 15efcd2173
> ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java ce328f91ff
> ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/scripts/params_linux.py 98141456c7
>
>
> Diff: https://reviews.apache.org/r/64502/diff/3/
>
>
> Testing
> -------
>
> Manual install via UI and Blueprint
>
>
> Thanks,
>
> Jonathan Hurley
>
>