You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/12/09 22:52:59 UTC

[GitHub] [spark] vanzin commented on a change in pull request #26493: [SPARK-29574][K8S] Add SPARK_DIST_CLASSPATH to the executor class path

vanzin commented on a change in pull request #26493: [SPARK-29574][K8S] Add SPARK_DIST_CLASSPATH to the executor class path
URL: https://github.com/apache/spark/pull/26493#discussion_r355730157
 
 

 ##########
 File path: docs/hadoop-provided.md
 ##########
 @@ -39,3 +39,25 @@ export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
 export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)
 
 {% endhighlight %}
+
+# Hadoop Free Build Setup for Spark on Kubernetes  
+To run the Hadoop free build of Spark on Kubernetes, the executor image must have the appropriate version of Hadoop binaries and the correct `SPARK_DIST_CLASSPATH` value set. See the example below for the relevant changes needed in the executor Dockerfile:
+
+{% highlight bash %}
+### Set environment variables in the executor dockerfile ###
+
+ENV SPARK_HOME="/opt/spark"  
+ENV HADOOP_HOME="/opt/hadoop"  
+ENV PATH="$SPARK_HOME/bin:$HADOOP_HOME/bin:$PATH"  
+...  
+
+#Copy your target hadoop binaries to the executor hadoop home   
+
+COPY /opt/hadoop3  $HADOOP_HOME  
+...
+
+#Set your SPARK_DIST_CLASSPATH. Note it uses the hadoop executable to get the jars path and cannot be used with an ENV statement. You may need to add additional paths manually to SPARK_DIST_CLASSPATH if you have other dependencies for example in /hadoop/tools/lib/*  
+
+RUN export SPARK_DIST_CLASSPATH=$(hadoop classpath)  
 
 Review comment:
   Are you sure this works? I tried the following docker file:
   
   ```
   FROM openjdk:8-jdk-slim
   RUN export MY_HOME="/foo/bar"
   ENTRYPOINT [ "/bin/bash", "-c", "echo $HOME:$MY_HOME" ]
   ```
   
   And it only prints the contents of `$HOME`, not `$MY_HOME`.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org