You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Maziyar PANAHI (JIRA)" <ji...@apache.org> on 2019/02/03 11:25:00 UTC
[jira] [Created] (ZEPPELIN-3986) Cannot access any JAR in yarn cluster mode

Maziyar PANAHI created ZEPPELIN-3986:
----------------------------------------

             Summary: Cannot access any JAR in yarn cluster mode
                 Key: ZEPPELIN-3986
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3986
             Project: Zeppelin
          Issue Type: Bug
          Components: Interpreters
    Affects Versions: 0.8.1, 0.8.2
         Environment: Cloudera/CDH 6.1

Spark 2.4

Hadoop 3.0

Zeppelin 0.8.2 (built from the latest merged pull request)
            Reporter: Maziyar PANAHI


Hello,

YARN cluster mode was introduced in `0.8.0` and fixed for not finding ZeppelinContext in `0.8.1`. However, I have difficulties to access any JAR in order to `import` them inside my notebook.

I have a CDH cluster, where everything works in deployMode `client`, but the moment I switch to `cluster` and the driver is not the same machine as Zeppelin server it can't find the packages.

Working configs:

Inside interpreter:

master: yarn

spark.submit.deployMode: client

Inside `zeppelin-env.sh`:

 
{code:java}
export SPARK_SUBMIT_OPTIONS="--jars hdfs:///user/maziyar/jars/zeppelin/graphframes/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar
{code}
 

Since the JAR is already on HDFS, switching to `cluster` should be as simple as changing `spark.submit.deployMode` to the cluster. However, doing that results in:

 
{code:java}
import org.graphframes._

<console>:23: error: object graphframes is not a member of package org import org.graphframes._
{code}
I can see my JAR in Spark UI in `spark.yarn.dist.jars` and `spark.yarn.secondary.jars` in both cluster and client mode.

 

In client mode `sc.jars` will result:

 
{code:java}
res2: Seq[String] = List(file:/opt/zeppelin-0.8.2-new/interpreter/spark/spark-interpreter-0.8.2-SNAPSHOT.jar){code}
 

However, in `cluster` mode the same command is empty. I thought maybe there is something extra or missing on Zeppelin Spark Interpreter that doesn't not allow the JAR being used in cluster mode.

 

This is how Spark UI reports my JAR in `client` mode:

 

 

 

 
|spark.repl.local.jars |file:/tmp/spark-3aadfe3c-8821-4dfe-875b-744c2e35a95a/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|spark.yarn.dist.jars |hdfs://hadoop-master-1:8020/user/mpanahi/jars/zeppelin/graphframes/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|spark.yarn.secondary.jars|graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|sun.java.command|org.apache.spark.deploy.SparkSubmit --master yarn --conf spark.executor.memory=5g --conf spark.driver.memory=8g --conf spark.driver.cores=4 --conf spark.yarn.isPython=true --conf spark.driver.extraClassPath=:/opt/zeppelin-0.8.2-new/interpreter/spark/*:/opt/zeppelin-0.8.2-new/zeppelin-interpreter/target/lib/*::/opt/zeppelin-0.8.2-new/zeppelin-interpreter/target/classes:/opt/zeppelin-0.8.2-new/zeppelin-interpreter/target/test-classes:/opt/zeppelin-0.8.2-new/zeppelin-zengine/target/test-classes:/opt/zeppelin-0.8.2-new/interpreter/spark/spark-interpreter-0.8.2-SNAPSHOT.jar --conf spark.useHiveContext=true --conf spark.app.name=Zeppelin --conf spark.executor.cores=5 --conf spark.submit.deployMode=client --conf spark.dynamicAllocation.maxExecutors=50 --conf spark.dynamicAllocation.initialExecutors=1 --conf spark.dynamicAllocation.enabled=true --conf spark.driver.extraJavaOptions= -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin-0.8.2-new/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-mpanahi-zeppelin-hadoop-gateway.log --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer --jars hdfs:///user/mpanahi/jars/zeppelin/graphframes/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar,|

 

This is how Spark UI reports my JAR in `cluster` mode (same configs as I mentioned above):

  
|spark.repl.local.jars |This field does not exist in cluster mode|
|spark.yarn.dist.jars |hdfs://hadoop-master-1:8020/user/mpanahi/jars/zeppelin/graphframes/graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|spark.yarn.secondary.jars|graphframes-assembly-0.7.0-spark2.3-SNAPSHOT.jar|
|sun.java.command|org.apache.spark.deploy.yarn.ApplicationMaster --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer --jar file:/opt/zeppelin-0.8.2-new/interpreter/spark/spark-interpreter-0.8.2-SNAPSHOT.jar --arg 134.158.74.122 --arg 46130 --arg : --properties-file /yarn/nm/usercache/mpanahi/appcache/application_1547731772080_0077/container_1547731772080_0077_01_000001/__spark_conf__/__spark_conf__.properties|

 

Thank you.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)