You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Sofiane Cherchalli <so...@gmail.com> on 2017/05/03 12:14:26 UTC

Running a notebook in a standalone cluster mode issues

Hi Moon and al,

I have a standalone cluster with one master, one worker. I submit jobs
through zeppelin. master, worker, and zeppelin run in a separate container.

My zeppelin-env.sh:

# spark home
export SPARK_HOME=/usr/local/spark

# set hadoop conf dir
export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop

# set options to pass spark-submit command
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.11:1.5.0
--deploy-mode cluster"

# worker memory
export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g
-Dspark.submit.deployMode=cluster"

# master
export MASTER="spark://<master>:7077"

My notebook code is very simple. It read csv and write it again in
directory /data previously created:
%spark.pyspark
def read_input(fin):
    '''
    Read input file from filesystem and return dataframe
    '''
    df = sqlContext.read.load(fin, format='com.databricks.spark.csv',
mode='PERMISSIVE', header='false', inferSchema='true')
    return df

def write_output(df, fout):
    '''
    Write dataframe to filesystem
    '''

df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',',
header='true').save(fout)

data_in = '/data/01.csv'
data_out = '/data/02.csv'
df = read_input(data_in)
newdf = del_columns(df)
write_output(newdf, data_out)


I used --deploy-mode to *cluster* so that the driver is run in the worker
in order to read the CSV in the /data directory and not in zeppelin. When
running the notebook it complains that
/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is
missing:
org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache set
to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url =
jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency :: resolving
dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
found com.databricks#spark-csv_2.11;1.5.0 in central found
org.apache.commons#commons-csv;1.1 in central found
com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
resolve 310ms :: artifacts dl 6ms :: modules in use:
com.databricks#spark-csv_2.11;1.5.0 from central in [default]
com.univocity#univocity-parsers;1.5.1 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
--------------------------------------------------------------------- | |
modules || artifacts | | conf | number| search|dwnlded|evicted||
number|dwnlded|
--------------------------------------------------------------------- |
default | 3 | 0 | 0 | 0 || 3 | 0 |
--------------------------------------------------------------------- ::
retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
REST application submission protocol. SLF4J: Class path contains multiple
SLF4J bindings. SLF4J: Found binding in
[jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
REST server. Falling back to legacy submission gateway instead. Ivy Default
Cache set to: /root/.ivy2/cache The jars for the packages stored in:
/root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
[default] found com.databricks#spark-csv_2.11;1.5.0 in central found
org.apache.commons#commons-csv;1.1 in central found
com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
resolve 69ms :: artifacts dl 5ms :: modules in use:
com.databricks#spark-csv_2.11;1.5.0 from central in [default]
com.univocity#univocity-parsers;1.5.1 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
--------------------------------------------------------------------- | |
modules || artifacts | | conf | number| search|dwnlded|evicted||
number|dwnlded|
--------------------------------------------------------------------- |
default | 3 | 0 | 0 | 0 || 3 | 0 |
--------------------------------------------------------------------- ::
retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
artifacts copied, 3 already retrieved (0kB/4ms)
java.nio.file.NoSuchFileException:
/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)




So, what I did next is copy the
/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar
in the worker container and restarted the interpreter and run the notebook.
It doesn't complain anymore about the zeppelin-spark_2.11-0.7.1.jar, but I
got another exception in the notebook related to the
RemoteInterpreterManagedProcess:

org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache set
to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url =
jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.11 added as a dependency :: resolving
dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
found com.databricks#spark-csv_2.11;1.5.0 in central found
org.apache.commons#commons-csv;1.1 in central found
com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
resolve 277ms :: artifacts dl 7ms :: modules in use:
com.databricks#spark-csv_2.11;1.5.0 from central in [default]
com.univocity#univocity-parsers;1.5.1 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
--------------------------------------------------------------------- | |
modules || artifacts | | conf | number| search|dwnlded|evicted||
number|dwnlded|
--------------------------------------------------------------------- |
default | 3 | 0 | 0 | 0 || 3 | 0 |
--------------------------------------------------------------------- ::
retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
REST application submission protocol. SLF4J: Class path contains multiple
SLF4J bindings. SLF4J: Found binding in
[jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
REST server. Falling back to legacy submission gateway instead. Ivy Default
Cache set to: /root/.ivy2/cache The jars for the packages stored in:
/root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
[default] found com.databricks#spark-csv_2.11;1.5.0 in central found
org.apache.commons#commons-csv;1.1 in central found
com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
resolve 66ms :: artifacts dl 5ms :: modules in use:
com.databricks#spark-csv_2.11;1.5.0 from central in [default]
com.univocity#univocity-parsers;1.5.1 from central in [default]
org.apache.commons#commons-csv;1.1 from central in [default]
--------------------------------------------------------------------- | |
modules || artifacts | | conf | number| search|dwnlded|evicted||
number|dwnlded|
--------------------------------------------------------------------- |
default | 3 | 0 | 0 | 0 || 3 | 0 |
--------------------------------------------------------------------- ::
retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
artifacts copied, 3 already retrieved (0kB/4ms) at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:258)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:106)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at
org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



In the Spark jobs I see a
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer running, and
the stderr logs complains about missing log4j.properties:

Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp"
"/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar:/usr/local/spark/conf/:/usr/local/spark/jars/*:/opt/hadoop-2.7.3/etc/hadoop:/opt/hadoop-2.7.3/etc/hadoop/*:/opt/hadoop-2.7.3/share/hadoop/common/lib/*:/opt/hadoop-2.7.3/share/hadoop/common/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.3/share/hadoop/yarn/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/*:/opt/hadoop-2.7.3/share/hadoop/tools/lib/*"
"-Xmx1024M" "-Dspark.jars=file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar"
"-Dspark.driver.supervise=false" "-Dspark.driver.extraJavaOptions=
-Dfile.encoding=UTF-8
-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties
-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log"
"-Dspark.app.name=org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer"
"-Dspark.submit.deployMode=cluster"
"-Dspark.master=spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077"
"-Dspark.driver.extraClassPath=::/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*::/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar"
"-Dspark.rpc.askTimeout=10s" "-Dfile.encoding=UTF-8"
"-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties"
"-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log"
"org.apache.spark.deploy.worker.DriverWrapper"
"spark://Worker@172.30.102.7:41417"
"/usr/local/spark/work/driver-20170503115405-0036/zeppelin-spark_2.11-0.7.1.jar"
"org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer"
"46151"
========================================

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:ERROR Could not read configuration file from URL
[file:/opt/zeppelin-0.7.1/conf/log4j.properties].
java.io.FileNotFoundException:
/opt/zeppelin-0.7.1/conf/log4j.properties (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
	at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:85)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2373)
	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:42)
	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
log4j:ERROR Ignoring configuration file
[file:/opt/zeppelin-0.7.1/conf/log4j.properties].
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/05/03 11:54:06 INFO SecurityManager: Changing view acls to: root
17/05/03 11:54:06 INFO SecurityManager: Changing modify acls to: root
17/05/03 11:54:06 INFO SecurityManager: Changing view acls groups to:
17/05/03 11:54:06 INFO SecurityManager: Changing modify acls groups to:
17/05/03 11:54:06 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users  with view
permissions: Set(root); groups with view permissions: Set(); users
with modify permissions: Set(root); groups with modify permissions:
Set()
17/05/03 11:54:07 INFO Utils: Successfully started service 'Driver' on
port 39770.
17/05/03 11:54:07 INFO WorkerWatcher: Connecting to worker
spark://Worker@172.30.102.7:41417
17/05/03 11:54:07 INFO TransportClientFactory: Successfully created
connection to /172.30.102.7:41417 after 27 ms (0 ms spent in
bootstraps)
17/05/03 11:54:07 INFO WorkerWatcher: Successfully connected to
spark://Worker@172.30.102.7:41417
17/05/03 11:54:07 INFO RemoteInterpreterServer: Starting remote
interpreter server on port 46151


The process never finishes, so I got to kill it...

What's going on? Anything wrong with my configuration?

Any help appreciated. I am struggling since a week.

Re: Running a notebook in a standalone cluster mode issues

Posted by Jeff Zhang <zj...@gmail.com>.

Or you can try livy interpreter which support yarn cluster mode

https://zeppelin.apache.org/docs/0.8.0-SNAPSHOT/interpreter/livy.html


Sofiane Cherchalli <so...@gmail.com>于2017年5月4日周四 上午3:49写道：

> Hi Moon,
>
> Great, I am keen to see Zeppelin-2040 resolved soon. But meanwhile is
> there any workaround?
>
> Thanks.
>
> Sofiane
>
>
> El El mié, 3 may 2017 a las 20:40, moon soo Lee <mo...@apache.org>
> escribió:
>
>> Zeppelin don't need to be installed in every workers.
>> You can think the way SparkInterpreter in Zeppelin work is very similar
>> to spark-shell (which works in client mode), until ZEPPELIN-2040 is
>> resolved.
>>
>> Therefore, if spark-shell works in a machine with your standalone
>> cluster, Zeppelin will work in the same machine with the standalone cluster.
>>
>> Thanks,
>> moon
>>
>> On Wed, May 3, 2017 at 2:28 PM Sofiane Cherchalli <so...@gmail.com>
>> wrote:
>>
>>> Hi Moon,
>>>
>>> So in my case, if II have standalone or yarn cluster, the workaround
>>> would be to install zeppelin along every worker, proxy them,  and run each
>>> zeppelin in client mode ?
>>>
>>> Thanks,
>>> Sofiane
>>>
>>> El El mié, 3 may 2017 a las 19:12, moon soo Lee <mo...@apache.org>
>>> escribió:
>>>
>>>> Hi,
>>>>
>>>> Zeppelin does not support cluster mode deploy at the moment.
>>>> Fortunately, there will be a support for cluster mode, soon!
>>>> Please keep an eye on
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2040.
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>> On Wed, May 3, 2017 at 11:00 AM Sofiane Cherchalli <so...@gmail.com>
>>>> wrote:
>>>>
>>>>> Shall I configure a remote interpreter to my notebook to run on the
>>>>> worker?
>>>>>
>>>>> Mayday!
>>>>>
>>>>> On Wed, May 3, 2017 at 4:18 PM, Sofiane Cherchalli <
>>>>> sofianito@gmail.com> wrote:
>>>>>
>>>>>> What port does the remote interpreter use?
>>>>>>
>>>>>> On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <
>>>>>> sofianito@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Moon and al,
>>>>>>>
>>>>>>> I have a standalone cluster with one master, one worker. I submit
>>>>>>> jobs through zeppelin. master, worker, and zeppelin run in a separate
>>>>>>> container.
>>>>>>>
>>>>>>> My zeppelin-env.sh:
>>>>>>>
>>>>>>> # spark home
>>>>>>> export SPARK_HOME=/usr/local/spark
>>>>>>>
>>>>>>> # set hadoop conf dir
>>>>>>> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
>>>>>>>
>>>>>>> # set options to pass spark-submit command
>>>>>>> export SPARK_SUBMIT_OPTIONS="--packages
>>>>>>> com.databricks:spark-csv_2.11:1.5.0 --deploy-mode cluster"
>>>>>>>
>>>>>>> # worker memory
>>>>>>> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g
>>>>>>> -Dspark.submit.deployMode=cluster"
>>>>>>>
>>>>>>> # master
>>>>>>> export MASTER="spark://<master>:7077"
>>>>>>>
>>>>>>> My notebook code is very simple. It read csv and write it again in
>>>>>>> directory /data previously created:
>>>>>>> %spark.pyspark
>>>>>>> def read_input(fin):
>>>>>>>     '''
>>>>>>>     Read input file from filesystem and return dataframe
>>>>>>>     '''
>>>>>>>     df = sqlContext.read.load(fin,
>>>>>>> format='com.databricks.spark.csv', mode='PERMISSIVE', header='false',
>>>>>>> inferSchema='true')
>>>>>>>     return df
>>>>>>>
>>>>>>> def write_output(df, fout):
>>>>>>>     '''
>>>>>>>     Write dataframe to filesystem
>>>>>>>     '''
>>>>>>>
>>>>>>> df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',',
>>>>>>> header='true').save(fout)
>>>>>>>
>>>>>>> data_in = '/data/01.csv'
>>>>>>> data_out = '/data/02.csv'
>>>>>>> df = read_input(data_in)
>>>>>>> newdf = del_columns(df)
>>>>>>> write_output(newdf, data_out)
>>>>>>>
>>>>>>>
>>>>>>> I used --deploy-mode to *cluster* so that the driver is run in the
>>>>>>> worker in order to read the CSV in the /data directory and not in zeppelin.
>>>>>>> When running the notebook it complains that
>>>>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is
>>>>>>> missing:
>>>>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default
>>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>>> /root/.ivy2/jars :: loading settings :: url =
>>>>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>>> resolve 310ms :: artifacts dl 6ms :: modules in use:
>>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>>> --------------------------------------------------------------------- | |
>>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>>> number|dwnlded|
>>>>>>> --------------------------------------------------------------------- |
>>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>>> --------------------------------------------------------------------- ::
>>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>>>>> SLF4J bindings. SLF4J: Found binding in
>>>>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>>>> explanation. SLF4J: Actual binding is of type
>>>>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>>> resolve 69ms :: artifacts dl 5ms :: modules in use:
>>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>>> --------------------------------------------------------------------- | |
>>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>>> number|dwnlded|
>>>>>>> --------------------------------------------------------------------- |
>>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>>> --------------------------------------------------------------------- ::
>>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>>> artifacts copied, 3 already retrieved (0kB/4ms)
>>>>>>> java.nio.file.NoSuchFileException:
>>>>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar at
>>>>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at
>>>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So, what I did next is copy the /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar
>>>>>>> in the worker container and restarted the interpreter and run the notebook.
>>>>>>> It doesn't complain anymore about the zeppelin-spark_2.11-0.7.1.jar,
>>>>>>> but I got another exception in the notebook related to the
>>>>>>> RemoteInterpreterManagedProcess:
>>>>>>>
>>>>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default
>>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>>> /root/.ivy2/jars :: loading settings :: url =
>>>>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>>> resolve 277ms :: artifacts dl 7ms :: modules in use:
>>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>>> --------------------------------------------------------------------- | |
>>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>>> number|dwnlded|
>>>>>>> --------------------------------------------------------------------- |
>>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>>> --------------------------------------------------------------------- ::
>>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>>>>> SLF4J bindings. SLF4J: Found binding in
>>>>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>>>> explanation. SLF4J: Actual binding is of type
>>>>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>>> resolve 66ms :: artifacts dl 5ms :: modules in use:
>>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>>> --------------------------------------------------------------------- | |
>>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>>> number|dwnlded|
>>>>>>> --------------------------------------------------------------------- |
>>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>>> --------------------------------------------------------------------- ::
>>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>>> artifacts copied, 3 already retrieved (0kB/4ms) at
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143)
>>>>>>> at
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73)
>>>>>>> at
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:258)
>>>>>>> at
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423)
>>>>>>> at
>>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:106)
>>>>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at
>>>>>>> org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
>>>>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
>>>>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>>>>>> at
>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In the Spark jobs I see a
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
>>>>>>> running, and the stderr logs complains about missing log4j.properties:
>>>>>>>
>>>>>>> Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar:/usr/local/spark/conf/:/usr/local/spark/jars/*:/opt/hadoop-2.7.3/etc/hadoop:/opt/hadoop-2.7.3/etc/hadoop/*:/opt/hadoop-2.7.3/share/hadoop/common/lib/*:/opt/hadoop-2.7.3/share/hadoop/common/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.3/share/hadoop/yarn/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/*:/opt/hadoop-2.7.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.jars=file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.driver.supervise=false" "-Dspark.driver.extraJavaOptions= -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties -Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "-Dspark.app.name=org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077" "-Dspark.driver.extraClassPath=::/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*::/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.rpc.askTimeout=10s" "-Dfile.encoding=UTF-8" "-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties" "-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.30.102.7:41417" "/usr/local/spark/work/driver-20170503115405-0036/zeppelin-spark_2.11-0.7.1.jar" "org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "46151"
>>>>>>> ========================================
>>>>>>>
>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in [jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>> log4j:ERROR Could not read configuration file from URL [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>>>>> java.io.FileNotFoundException: /opt/zeppelin-0.7.1/conf/log4j.properties (No such file or directory)
>>>>>>> 	at java.io.FileInputStream.open0(Native Method)
>>>>>>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>>>>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>>>>>>> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>>>>>>> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>>>>>>> 	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>>>>>>> 	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>>>>>>> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>>>>>>> 	at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
>>>>>>> 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>>>>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
>>>>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
>>>>>>> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>>>>>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:85)
>>>>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>>>>> 	at scala.Option.getOrElse(Option.scala:121)
>>>>>>> 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2373)
>>>>>>> 	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
>>>>>>> 	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:42)
>>>>>>> 	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>>>>>>> log4j:ERROR Ignoring configuration file [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>>>>> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>>>>>>> log4j:WARN Please initialize the log4j system properly.
>>>>>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
>>>>>>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls to: root
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls to: root
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls groups to:
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls groups to:
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
>>>>>>> 17/05/03 11:54:07 INFO Utils: Successfully started service 'Driver' on port 39770.
>>>>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Connecting to worker spark://Worker@172.30.102.7:41417
>>>>>>> 17/05/03 11:54:07 INFO TransportClientFactory: Successfully created connection to /172.30.102.7:41417 after 27 ms (0 ms spent in bootstraps)
>>>>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Successfully connected to spark://Worker@172.30.102.7:41417
>>>>>>> 17/05/03 11:54:07 INFO RemoteInterpreterServer: Starting remote interpreter server on port 46151
>>>>>>>
>>>>>>>
>>>>>>> The process never finishes, so I got to kill it...
>>>>>>>
>>>>>>> What's going on? Anything wrong with my configuration?
>>>>>>>
>>>>>>> Any help appreciated. I am struggling since a week.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>

Re: Running a notebook in a standalone cluster mode issues

Posted by moon soo Lee <mo...@apache.org>.

Any workaround except for using client mode, it's difficult to think ...

Thanks,
moon

On Wed, 3 May 2017 at 3:49 PM Sofiane Cherchalli <so...@gmail.com>
wrote:

> Hi Moon,
>
> Great, I am keen to see Zeppelin-2040 resolved soon. But meanwhile is
> there any workaround?
>
> Thanks.
>
> Sofiane
>
>
> El El mié, 3 may 2017 a las 20:40, moon soo Lee <mo...@apache.org>
> escribió:
>
>> Zeppelin don't need to be installed in every workers.
>> You can think the way SparkInterpreter in Zeppelin work is very similar
>> to spark-shell (which works in client mode), until ZEPPELIN-2040 is
>> resolved.
>>
>> Therefore, if spark-shell works in a machine with your standalone
>> cluster, Zeppelin will work in the same machine with the standalone cluster.
>>
>> Thanks,
>> moon
>>
>> On Wed, May 3, 2017 at 2:28 PM Sofiane Cherchalli <so...@gmail.com>
>> wrote:
>>
>>> Hi Moon,
>>>
>>> So in my case, if II have standalone or yarn cluster, the workaround
>>> would be to install zeppelin along every worker, proxy them,  and run each
>>> zeppelin in client mode ?
>>>
>>> Thanks,
>>> Sofiane
>>>
>>> El El mié, 3 may 2017 a las 19:12, moon soo Lee <mo...@apache.org>
>>> escribió:
>>>
>>>> Hi,
>>>>
>>>> Zeppelin does not support cluster mode deploy at the moment.
>>>> Fortunately, there will be a support for cluster mode, soon!
>>>> Please keep an eye on
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-2040.
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>> On Wed, May 3, 2017 at 11:00 AM Sofiane Cherchalli <so...@gmail.com>
>>>> wrote:
>>>>
>>>>> Shall I configure a remote interpreter to my notebook to run on the
>>>>> worker?
>>>>>
>>>>> Mayday!
>>>>>
>>>>> On Wed, May 3, 2017 at 4:18 PM, Sofiane Cherchalli <
>>>>> sofianito@gmail.com> wrote:
>>>>>
>>>>>> What port does the remote interpreter use?
>>>>>>
>>>>>> On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <
>>>>>> sofianito@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Moon and al,
>>>>>>>
>>>>>>> I have a standalone cluster with one master, one worker. I submit
>>>>>>> jobs through zeppelin. master, worker, and zeppelin run in a separate
>>>>>>> container.
>>>>>>>
>>>>>>> My zeppelin-env.sh:
>>>>>>>
>>>>>>> # spark home
>>>>>>> export SPARK_HOME=/usr/local/spark
>>>>>>>
>>>>>>> # set hadoop conf dir
>>>>>>> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
>>>>>>>
>>>>>>> # set options to pass spark-submit command
>>>>>>> export SPARK_SUBMIT_OPTIONS="--packages
>>>>>>> com.databricks:spark-csv_2.11:1.5.0 --deploy-mode cluster"
>>>>>>>
>>>>>>> # worker memory
>>>>>>> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g
>>>>>>> -Dspark.submit.deployMode=cluster"
>>>>>>>
>>>>>>> # master
>>>>>>> export MASTER="spark://<master>:7077"
>>>>>>>
>>>>>>> My notebook code is very simple. It read csv and write it again in
>>>>>>> directory /data previously created:
>>>>>>> %spark.pyspark
>>>>>>> def read_input(fin):
>>>>>>>     '''
>>>>>>>     Read input file from filesystem and return dataframe
>>>>>>>     '''
>>>>>>>     df = sqlContext.read.load(fin,
>>>>>>> format='com.databricks.spark.csv', mode='PERMISSIVE', header='false',
>>>>>>> inferSchema='true')
>>>>>>>     return df
>>>>>>>
>>>>>>> def write_output(df, fout):
>>>>>>>     '''
>>>>>>>     Write dataframe to filesystem
>>>>>>>     '''
>>>>>>>
>>>>>>> df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',',
>>>>>>> header='true').save(fout)
>>>>>>>
>>>>>>> data_in = '/data/01.csv'
>>>>>>> data_out = '/data/02.csv'
>>>>>>> df = read_input(data_in)
>>>>>>> newdf = del_columns(df)
>>>>>>> write_output(newdf, data_out)
>>>>>>>
>>>>>>>
>>>>>>> I used --deploy-mode to *cluster* so that the driver is run in the
>>>>>>> worker in order to read the CSV in the /data directory and not in zeppelin.
>>>>>>> When running the notebook it complains that
>>>>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is
>>>>>>> missing:
>>>>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default
>>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>>> /root/.ivy2/jars :: loading settings :: url =
>>>>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>>> resolve 310ms :: artifacts dl 6ms :: modules in use:
>>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>>> --------------------------------------------------------------------- | |
>>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>>> number|dwnlded|
>>>>>>> --------------------------------------------------------------------- |
>>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>>> --------------------------------------------------------------------- ::
>>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>>>>> SLF4J bindings. SLF4J: Found binding in
>>>>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>>>> explanation. SLF4J: Actual binding is of type
>>>>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>>> resolve 69ms :: artifacts dl 5ms :: modules in use:
>>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>>> --------------------------------------------------------------------- | |
>>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>>> number|dwnlded|
>>>>>>> --------------------------------------------------------------------- |
>>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>>> --------------------------------------------------------------------- ::
>>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>>> artifacts copied, 3 already retrieved (0kB/4ms)
>>>>>>> java.nio.file.NoSuchFileException:
>>>>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar at
>>>>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at
>>>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So, what I did next is copy the /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar
>>>>>>> in the worker container and restarted the interpreter and run the notebook.
>>>>>>> It doesn't complain anymore about the zeppelin-spark_2.11-0.7.1.jar,
>>>>>>> but I got another exception in the notebook related to the
>>>>>>> RemoteInterpreterManagedProcess:
>>>>>>>
>>>>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default
>>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>>> /root/.ivy2/jars :: loading settings :: url =
>>>>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>>> resolve 277ms :: artifacts dl 7ms :: modules in use:
>>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>>> --------------------------------------------------------------------- | |
>>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>>> number|dwnlded|
>>>>>>> --------------------------------------------------------------------- |
>>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>>> --------------------------------------------------------------------- ::
>>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>>>>> SLF4J bindings. SLF4J: Found binding in
>>>>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>>>> explanation. SLF4J: Actual binding is of type
>>>>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>>> resolve 66ms :: artifacts dl 5ms :: modules in use:
>>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>>> --------------------------------------------------------------------- | |
>>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>>> number|dwnlded|
>>>>>>> --------------------------------------------------------------------- |
>>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>>> --------------------------------------------------------------------- ::
>>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>>> artifacts copied, 3 already retrieved (0kB/4ms) at
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143)
>>>>>>> at
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73)
>>>>>>> at
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:258)
>>>>>>> at
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423)
>>>>>>> at
>>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:106)
>>>>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at
>>>>>>> org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
>>>>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
>>>>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>>>>>> at
>>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>>> at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In the Spark jobs I see a
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
>>>>>>> running, and the stderr logs complains about missing log4j.properties:
>>>>>>>
>>>>>>> Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar:/usr/local/spark/conf/:/usr/local/spark/jars/*:/opt/hadoop-2.7.3/etc/hadoop:/opt/hadoop-2.7.3/etc/hadoop/*:/opt/hadoop-2.7.3/share/hadoop/common/lib/*:/opt/hadoop-2.7.3/share/hadoop/common/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.3/share/hadoop/yarn/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/*:/opt/hadoop-2.7.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.jars=file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.driver.supervise=false" "-Dspark.driver.extraJavaOptions= -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties -Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "-Dspark.app.name=org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077" "-Dspark.driver.extraClassPath=::/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*::/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.rpc.askTimeout=10s" "-Dfile.encoding=UTF-8" "-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties" "-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.30.102.7:41417" "/usr/local/spark/work/driver-20170503115405-0036/zeppelin-spark_2.11-0.7.1.jar" "org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "46151"
>>>>>>> ========================================
>>>>>>>
>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in [jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>> log4j:ERROR Could not read configuration file from URL [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>>>>> java.io.FileNotFoundException: /opt/zeppelin-0.7.1/conf/log4j.properties (No such file or directory)
>>>>>>> 	at java.io.FileInputStream.open0(Native Method)
>>>>>>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>>>>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>>>>>>> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>>>>>>> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>>>>>>> 	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>>>>>>> 	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>>>>>>> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>>>>>>> 	at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
>>>>>>> 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>>>>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
>>>>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
>>>>>>> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>>>>>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:85)
>>>>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>>>>> 	at scala.Option.getOrElse(Option.scala:121)
>>>>>>> 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2373)
>>>>>>> 	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
>>>>>>> 	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:42)
>>>>>>> 	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>>>>>>> log4j:ERROR Ignoring configuration file [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>>>>> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>>>>>>> log4j:WARN Please initialize the log4j system properly.
>>>>>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
>>>>>>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls to: root
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls to: root
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls groups to:
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls groups to:
>>>>>>> 17/05/03 11:54:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
>>>>>>> 17/05/03 11:54:07 INFO Utils: Successfully started service 'Driver' on port 39770.
>>>>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Connecting to worker spark://Worker@172.30.102.7:41417
>>>>>>> 17/05/03 11:54:07 INFO TransportClientFactory: Successfully created connection to /172.30.102.7:41417 after 27 ms (0 ms spent in bootstraps)
>>>>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Successfully connected to spark://Worker@172.30.102.7:41417
>>>>>>> 17/05/03 11:54:07 INFO RemoteInterpreterServer: Starting remote interpreter server on port 46151
>>>>>>>
>>>>>>>
>>>>>>> The process never finishes, so I got to kill it...
>>>>>>>
>>>>>>> What's going on? Anything wrong with my configuration?
>>>>>>>
>>>>>>> Any help appreciated. I am struggling since a week.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>

Re: Running a notebook in a standalone cluster mode issues

Posted by Sofiane Cherchalli <so...@gmail.com>.

Hi Moon,

Great, I am keen to see Zeppelin-2040 resolved soon. But meanwhile is there
any workaround?

Thanks.
Sofiane


El El mié, 3 may 2017 a las 20:40, moon soo Lee <mo...@apache.org> escribió:

> Zeppelin don't need to be installed in every workers.
> You can think the way SparkInterpreter in Zeppelin work is very similar to
> spark-shell (which works in client mode), until ZEPPELIN-2040 is resolved.
>
> Therefore, if spark-shell works in a machine with your standalone cluster,
> Zeppelin will work in the same machine with the standalone cluster.
>
> Thanks,
> moon
>
> On Wed, May 3, 2017 at 2:28 PM Sofiane Cherchalli <so...@gmail.com>
> wrote:
>
>> Hi Moon,
>>
>> So in my case, if II have standalone or yarn cluster, the workaround
>> would be to install zeppelin along every worker, proxy them,  and run each
>> zeppelin in client mode ?
>>
>> Thanks,
>> Sofiane
>>
>> El El mié, 3 may 2017 a las 19:12, moon soo Lee <mo...@apache.org>
>> escribió:
>>
>>> Hi,
>>>
>>> Zeppelin does not support cluster mode deploy at the moment.
>>> Fortunately, there will be a support for cluster mode, soon!
>>> Please keep an eye on
>>> https://issues.apache.org/jira/browse/ZEPPELIN-2040.
>>>
>>> Thanks,
>>> moon
>>>
>>> On Wed, May 3, 2017 at 11:00 AM Sofiane Cherchalli <so...@gmail.com>
>>> wrote:
>>>
>>>> Shall I configure a remote interpreter to my notebook to run on the
>>>> worker?
>>>>
>>>> Mayday!
>>>>
>>>> On Wed, May 3, 2017 at 4:18 PM, Sofiane Cherchalli <sofianito@gmail.com
>>>> > wrote:
>>>>
>>>>> What port does the remote interpreter use?
>>>>>
>>>>> On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <
>>>>> sofianito@gmail.com> wrote:
>>>>>
>>>>>> Hi Moon and al,
>>>>>>
>>>>>> I have a standalone cluster with one master, one worker. I submit
>>>>>> jobs through zeppelin. master, worker, and zeppelin run in a separate
>>>>>> container.
>>>>>>
>>>>>> My zeppelin-env.sh:
>>>>>>
>>>>>> # spark home
>>>>>> export SPARK_HOME=/usr/local/spark
>>>>>>
>>>>>> # set hadoop conf dir
>>>>>> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
>>>>>>
>>>>>> # set options to pass spark-submit command
>>>>>> export SPARK_SUBMIT_OPTIONS="--packages
>>>>>> com.databricks:spark-csv_2.11:1.5.0 --deploy-mode cluster"
>>>>>>
>>>>>> # worker memory
>>>>>> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g
>>>>>> -Dspark.submit.deployMode=cluster"
>>>>>>
>>>>>> # master
>>>>>> export MASTER="spark://<master>:7077"
>>>>>>
>>>>>> My notebook code is very simple. It read csv and write it again in
>>>>>> directory /data previously created:
>>>>>> %spark.pyspark
>>>>>> def read_input(fin):
>>>>>>     '''
>>>>>>     Read input file from filesystem and return dataframe
>>>>>>     '''
>>>>>>     df = sqlContext.read.load(fin, format='com.databricks.spark.csv',
>>>>>> mode='PERMISSIVE', header='false', inferSchema='true')
>>>>>>     return df
>>>>>>
>>>>>> def write_output(df, fout):
>>>>>>     '''
>>>>>>     Write dataframe to filesystem
>>>>>>     '''
>>>>>>
>>>>>> df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',',
>>>>>> header='true').save(fout)
>>>>>>
>>>>>> data_in = '/data/01.csv'
>>>>>> data_out = '/data/02.csv'
>>>>>> df = read_input(data_in)
>>>>>> newdf = del_columns(df)
>>>>>> write_output(newdf, data_out)
>>>>>>
>>>>>>
>>>>>> I used --deploy-mode to *cluster* so that the driver is run in the
>>>>>> worker in order to read the CSV in the /data directory and not in zeppelin.
>>>>>> When running the notebook it complains that
>>>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is
>>>>>> missing:
>>>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default
>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>> /root/.ivy2/jars :: loading settings :: url =
>>>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>> resolve 310ms :: artifacts dl 6ms :: modules in use:
>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>> --------------------------------------------------------------------- | |
>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>> number|dwnlded|
>>>>>> --------------------------------------------------------------------- |
>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>> --------------------------------------------------------------------- ::
>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>>>> SLF4J bindings. SLF4J: Found binding in
>>>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: Found binding in
>>>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: Found binding in
>>>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>>> explanation. SLF4J: Actual binding is of type
>>>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>> resolve 69ms :: artifacts dl 5ms :: modules in use:
>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>> --------------------------------------------------------------------- | |
>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>> number|dwnlded|
>>>>>> --------------------------------------------------------------------- |
>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>> --------------------------------------------------------------------- ::
>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>> artifacts copied, 3 already retrieved (0kB/4ms)
>>>>>> java.nio.file.NoSuchFileException:
>>>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar at
>>>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at
>>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> So, what I did next is copy the /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar
>>>>>> in the worker container and restarted the interpreter and run the notebook.
>>>>>> It doesn't complain anymore about the zeppelin-spark_2.11-0.7.1.jar,
>>>>>> but I got another exception in the notebook related to the
>>>>>> RemoteInterpreterManagedProcess:
>>>>>>
>>>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default
>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>> /root/.ivy2/jars :: loading settings :: url =
>>>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>> resolve 277ms :: artifacts dl 7ms :: modules in use:
>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>> --------------------------------------------------------------------- | |
>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>> number|dwnlded|
>>>>>> --------------------------------------------------------------------- |
>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>> --------------------------------------------------------------------- ::
>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>>>> SLF4J bindings. SLF4J: Found binding in
>>>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: Found binding in
>>>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: Found binding in
>>>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>>> explanation. SLF4J: Actual binding is of type
>>>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>>> resolve 66ms :: artifacts dl 5ms :: modules in use:
>>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>>> --------------------------------------------------------------------- | |
>>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>>> number|dwnlded|
>>>>>> --------------------------------------------------------------------- |
>>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>>> --------------------------------------------------------------------- ::
>>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>>> artifacts copied, 3 already retrieved (0kB/4ms) at
>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:258)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423)
>>>>>> at
>>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:106)
>>>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at
>>>>>> org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
>>>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
>>>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>>>>> at
>>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>> at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>
>>>>>>
>>>>>>
>>>>>> In the Spark jobs I see a
>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
>>>>>> running, and the stderr logs complains about missing log4j.properties:
>>>>>>
>>>>>> Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar:/usr/local/spark/conf/:/usr/local/spark/jars/*:/opt/hadoop-2.7.3/etc/hadoop:/opt/hadoop-2.7.3/etc/hadoop/*:/opt/hadoop-2.7.3/share/hadoop/common/lib/*:/opt/hadoop-2.7.3/share/hadoop/common/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.3/share/hadoop/yarn/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/*:/opt/hadoop-2.7.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.jars=file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.driver.supervise=false" "-Dspark.driver.extraJavaOptions= -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties -Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "-Dspark.app.name=org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077" "-Dspark.driver.extraClassPath=::/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*::/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.rpc.askTimeout=10s" "-Dfile.encoding=UTF-8" "-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties" "-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.30.102.7:41417" "/usr/local/spark/work/driver-20170503115405-0036/zeppelin-spark_2.11-0.7.1.jar" "org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "46151"
>>>>>> ========================================
>>>>>>
>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: Found binding in [jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>>> log4j:ERROR Could not read configuration file from URL [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>>>> java.io.FileNotFoundException: /opt/zeppelin-0.7.1/conf/log4j.properties (No such file or directory)
>>>>>> 	at java.io.FileInputStream.open0(Native Method)
>>>>>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>>>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>>>>>> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>>>>>> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>>>>>> 	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>>>>>> 	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>>>>>> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>>>>>> 	at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
>>>>>> 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>>>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
>>>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
>>>>>> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>>>>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:85)
>>>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>>>> 	at scala.Option.getOrElse(Option.scala:121)
>>>>>> 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2373)
>>>>>> 	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
>>>>>> 	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:42)
>>>>>> 	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>>>>>> log4j:ERROR Ignoring configuration file [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>>>> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>>>>>> log4j:WARN Please initialize the log4j system properly.
>>>>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
>>>>>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls to: root
>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls to: root
>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls groups to:
>>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls groups to:
>>>>>> 17/05/03 11:54:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
>>>>>> 17/05/03 11:54:07 INFO Utils: Successfully started service 'Driver' on port 39770.
>>>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Connecting to worker spark://Worker@172.30.102.7:41417
>>>>>> 17/05/03 11:54:07 INFO TransportClientFactory: Successfully created connection to /172.30.102.7:41417 after 27 ms (0 ms spent in bootstraps)
>>>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Successfully connected to spark://Worker@172.30.102.7:41417
>>>>>> 17/05/03 11:54:07 INFO RemoteInterpreterServer: Starting remote interpreter server on port 46151
>>>>>>
>>>>>>
>>>>>> The process never finishes, so I got to kill it...
>>>>>>
>>>>>> What's going on? Anything wrong with my configuration?
>>>>>>
>>>>>> Any help appreciated. I am struggling since a week.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>

Re: Running a notebook in a standalone cluster mode issues

Posted by moon soo Lee <mo...@apache.org>.

Zeppelin don't need to be installed in every workers.
You can think the way SparkInterpreter in Zeppelin work is very similar to
spark-shell (which works in client mode), until ZEPPELIN-2040 is resolved.

Therefore, if spark-shell works in a machine with your standalone cluster,
Zeppelin will work in the same machine with the standalone cluster.

Thanks,
moon

On Wed, May 3, 2017 at 2:28 PM Sofiane Cherchalli <so...@gmail.com>
wrote:

> Hi Moon,
>
> So in my case, if II have standalone or yarn cluster, the workaround would
> be to install zeppelin along every worker, proxy them,  and run each
> zeppelin in client mode ?
>
> Thanks,
> Sofiane
>
> El El mié, 3 may 2017 a las 19:12, moon soo Lee <mo...@apache.org>
> escribió:
>
>> Hi,
>>
>> Zeppelin does not support cluster mode deploy at the moment. Fortunately,
>> there will be a support for cluster mode, soon!
>> Please keep an eye on https://issues.apache.org/jira/browse/ZEPPELIN-2040
>> .
>>
>> Thanks,
>> moon
>>
>> On Wed, May 3, 2017 at 11:00 AM Sofiane Cherchalli <so...@gmail.com>
>> wrote:
>>
>>> Shall I configure a remote interpreter to my notebook to run on the
>>> worker?
>>>
>>> Mayday!
>>>
>>> On Wed, May 3, 2017 at 4:18 PM, Sofiane Cherchalli <so...@gmail.com>
>>> wrote:
>>>
>>>> What port does the remote interpreter use?
>>>>
>>>> On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <sofianito@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Moon and al,
>>>>>
>>>>> I have a standalone cluster with one master, one worker. I submit jobs
>>>>> through zeppelin. master, worker, and zeppelin run in a separate container.
>>>>>
>>>>> My zeppelin-env.sh:
>>>>>
>>>>> # spark home
>>>>> export SPARK_HOME=/usr/local/spark
>>>>>
>>>>> # set hadoop conf dir
>>>>> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
>>>>>
>>>>> # set options to pass spark-submit command
>>>>> export SPARK_SUBMIT_OPTIONS="--packages
>>>>> com.databricks:spark-csv_2.11:1.5.0 --deploy-mode cluster"
>>>>>
>>>>> # worker memory
>>>>> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g
>>>>> -Dspark.submit.deployMode=cluster"
>>>>>
>>>>> # master
>>>>> export MASTER="spark://<master>:7077"
>>>>>
>>>>> My notebook code is very simple. It read csv and write it again in
>>>>> directory /data previously created:
>>>>> %spark.pyspark
>>>>> def read_input(fin):
>>>>>     '''
>>>>>     Read input file from filesystem and return dataframe
>>>>>     '''
>>>>>     df = sqlContext.read.load(fin, format='com.databricks.spark.csv',
>>>>> mode='PERMISSIVE', header='false', inferSchema='true')
>>>>>     return df
>>>>>
>>>>> def write_output(df, fout):
>>>>>     '''
>>>>>     Write dataframe to filesystem
>>>>>     '''
>>>>>
>>>>> df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',',
>>>>> header='true').save(fout)
>>>>>
>>>>> data_in = '/data/01.csv'
>>>>> data_out = '/data/02.csv'
>>>>> df = read_input(data_in)
>>>>> newdf = del_columns(df)
>>>>> write_output(newdf, data_out)
>>>>>
>>>>>
>>>>> I used --deploy-mode to *cluster* so that the driver is run in the
>>>>> worker in order to read the CSV in the /data directory and not in zeppelin.
>>>>> When running the notebook it complains that
>>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is
>>>>> missing:
>>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default
>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>> /root/.ivy2/jars :: loading settings :: url =
>>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>> resolve 310ms :: artifacts dl 6ms :: modules in use:
>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>> --------------------------------------------------------------------- | |
>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>> number|dwnlded|
>>>>> --------------------------------------------------------------------- |
>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>> --------------------------------------------------------------------- ::
>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>>> SLF4J bindings. SLF4J: Found binding in
>>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: Found binding in
>>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: Found binding in
>>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>> explanation. SLF4J: Actual binding is of type
>>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>> resolve 69ms :: artifacts dl 5ms :: modules in use:
>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>> --------------------------------------------------------------------- | |
>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>> number|dwnlded|
>>>>> --------------------------------------------------------------------- |
>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>> --------------------------------------------------------------------- ::
>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>> artifacts copied, 3 already retrieved (0kB/4ms)
>>>>> java.nio.file.NoSuchFileException:
>>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar at
>>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at
>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> So, what I did next is copy the /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar
>>>>> in the worker container and restarted the interpreter and run the notebook.
>>>>> It doesn't complain anymore about the zeppelin-spark_2.11-0.7.1.jar,
>>>>> but I got another exception in the notebook related to the
>>>>> RemoteInterpreterManagedProcess:
>>>>>
>>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default
>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>> /root/.ivy2/jars :: loading settings :: url =
>>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>> resolve 277ms :: artifacts dl 7ms :: modules in use:
>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>> --------------------------------------------------------------------- | |
>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>> number|dwnlded|
>>>>> --------------------------------------------------------------------- |
>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>> --------------------------------------------------------------------- ::
>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>>> SLF4J bindings. SLF4J: Found binding in
>>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: Found binding in
>>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: Found binding in
>>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>>> explanation. SLF4J: Actual binding is of type
>>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>>> org.apache.commons#commons-csv;1.1 in central found
>>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>>> resolve 66ms :: artifacts dl 5ms :: modules in use:
>>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>>> --------------------------------------------------------------------- | |
>>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>>> number|dwnlded|
>>>>> --------------------------------------------------------------------- |
>>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>>> --------------------------------------------------------------------- ::
>>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>>> artifacts copied, 3 already retrieved (0kB/4ms) at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:258)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:106)
>>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at
>>>>> org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
>>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
>>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>
>>>>>
>>>>>
>>>>> In the Spark jobs I see a
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
>>>>> running, and the stderr logs complains about missing log4j.properties:
>>>>>
>>>>> Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar:/usr/local/spark/conf/:/usr/local/spark/jars/*:/opt/hadoop-2.7.3/etc/hadoop:/opt/hadoop-2.7.3/etc/hadoop/*:/opt/hadoop-2.7.3/share/hadoop/common/lib/*:/opt/hadoop-2.7.3/share/hadoop/common/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.3/share/hadoop/yarn/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/*:/opt/hadoop-2.7.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.jars=file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.driver.supervise=false" "-Dspark.driver.extraJavaOptions= -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties -Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "-Dspark.app.name=org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077" "-Dspark.driver.extraClassPath=::/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*::/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.rpc.askTimeout=10s" "-Dfile.encoding=UTF-8" "-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties" "-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.30.102.7:41417" "/usr/local/spark/work/driver-20170503115405-0036/zeppelin-spark_2.11-0.7.1.jar" "org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "46151"
>>>>> ========================================
>>>>>
>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: Found binding in [jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>>> log4j:ERROR Could not read configuration file from URL [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>>> java.io.FileNotFoundException: /opt/zeppelin-0.7.1/conf/log4j.properties (No such file or directory)
>>>>> 	at java.io.FileInputStream.open0(Native Method)
>>>>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>>>>> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>>>>> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>>>>> 	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>>>>> 	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>>>>> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>>>>> 	at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
>>>>> 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
>>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
>>>>> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>>>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:85)
>>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>>> 	at scala.Option.getOrElse(Option.scala:121)
>>>>> 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2373)
>>>>> 	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
>>>>> 	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:42)
>>>>> 	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>>>>> log4j:ERROR Ignoring configuration file [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>>> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>>>>> log4j:WARN Please initialize the log4j system properly.
>>>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
>>>>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls to: root
>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls to: root
>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls groups to:
>>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls groups to:
>>>>> 17/05/03 11:54:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
>>>>> 17/05/03 11:54:07 INFO Utils: Successfully started service 'Driver' on port 39770.
>>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Connecting to worker spark://Worker@172.30.102.7:41417
>>>>> 17/05/03 11:54:07 INFO TransportClientFactory: Successfully created connection to /172.30.102.7:41417 after 27 ms (0 ms spent in bootstraps)
>>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Successfully connected to spark://Worker@172.30.102.7:41417
>>>>> 17/05/03 11:54:07 INFO RemoteInterpreterServer: Starting remote interpreter server on port 46151
>>>>>
>>>>>
>>>>> The process never finishes, so I got to kill it...
>>>>>
>>>>> What's going on? Anything wrong with my configuration?
>>>>>
>>>>> Any help appreciated. I am struggling since a week.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>

Re: Running a notebook in a standalone cluster mode issues

Posted by Sofiane Cherchalli <so...@gmail.com>.

Hi Moon,

So in my case, if II have standalone or yarn cluster, the workaround would
be to install zeppelin along every worker, proxy them,  and run each
zeppelin in client mode ?

Thanks,
Sofiane

El El mié, 3 may 2017 a las 19:12, moon soo Lee <mo...@apache.org> escribió:

> Hi,
>
> Zeppelin does not support cluster mode deploy at the moment. Fortunately,
> there will be a support for cluster mode, soon!
> Please keep an eye on https://issues.apache.org/jira/browse/ZEPPELIN-2040.
>
> Thanks,
> moon
>
> On Wed, May 3, 2017 at 11:00 AM Sofiane Cherchalli <so...@gmail.com>
> wrote:
>
>> Shall I configure a remote interpreter to my notebook to run on the
>> worker?
>>
>> Mayday!
>>
>> On Wed, May 3, 2017 at 4:18 PM, Sofiane Cherchalli <so...@gmail.com>
>> wrote:
>>
>>> What port does the remote interpreter use?
>>>
>>> On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <so...@gmail.com>
>>> wrote:
>>>
>>>> Hi Moon and al,
>>>>
>>>> I have a standalone cluster with one master, one worker. I submit jobs
>>>> through zeppelin. master, worker, and zeppelin run in a separate container.
>>>>
>>>> My zeppelin-env.sh:
>>>>
>>>> # spark home
>>>> export SPARK_HOME=/usr/local/spark
>>>>
>>>> # set hadoop conf dir
>>>> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
>>>>
>>>> # set options to pass spark-submit command
>>>> export SPARK_SUBMIT_OPTIONS="--packages
>>>> com.databricks:spark-csv_2.11:1.5.0 --deploy-mode cluster"
>>>>
>>>> # worker memory
>>>> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g
>>>> -Dspark.submit.deployMode=cluster"
>>>>
>>>> # master
>>>> export MASTER="spark://<master>:7077"
>>>>
>>>> My notebook code is very simple. It read csv and write it again in
>>>> directory /data previously created:
>>>> %spark.pyspark
>>>> def read_input(fin):
>>>>     '''
>>>>     Read input file from filesystem and return dataframe
>>>>     '''
>>>>     df = sqlContext.read.load(fin, format='com.databricks.spark.csv',
>>>> mode='PERMISSIVE', header='false', inferSchema='true')
>>>>     return df
>>>>
>>>> def write_output(df, fout):
>>>>     '''
>>>>     Write dataframe to filesystem
>>>>     '''
>>>>
>>>> df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',',
>>>> header='true').save(fout)
>>>>
>>>> data_in = '/data/01.csv'
>>>> data_out = '/data/02.csv'
>>>> df = read_input(data_in)
>>>> newdf = del_columns(df)
>>>> write_output(newdf, data_out)
>>>>
>>>>
>>>> I used --deploy-mode to *cluster* so that the driver is run in the
>>>> worker in order to read the CSV in the /data directory and not in zeppelin.
>>>> When running the notebook it complains that
>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is
>>>> missing:
>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache
>>>> set to: /root/.ivy2/cache The jars for the packages stored in:
>>>> /root/.ivy2/jars :: loading settings :: url =
>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>> org.apache.commons#commons-csv;1.1 in central found
>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>> resolve 310ms :: artifacts dl 6ms :: modules in use:
>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>> --------------------------------------------------------------------- | |
>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>> number|dwnlded|
>>>> --------------------------------------------------------------------- |
>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>> --------------------------------------------------------------------- ::
>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>> SLF4J bindings. SLF4J: Found binding in
>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>> explanation. SLF4J: Actual binding is of type
>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>> org.apache.commons#commons-csv;1.1 in central found
>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>> resolve 69ms :: artifacts dl 5ms :: modules in use:
>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>> --------------------------------------------------------------------- | |
>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>> number|dwnlded|
>>>> --------------------------------------------------------------------- |
>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>> --------------------------------------------------------------------- ::
>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>> artifacts copied, 3 already retrieved (0kB/4ms)
>>>> java.nio.file.NoSuchFileException:
>>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar at
>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at
>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>>
>>>>
>>>>
>>>>
>>>> So, what I did next is copy the /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar
>>>> in the worker container and restarted the interpreter and run the notebook.
>>>> It doesn't complain anymore about the zeppelin-spark_2.11-0.7.1.jar,
>>>> but I got another exception in the notebook related to the
>>>> RemoteInterpreterManagedProcess:
>>>>
>>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache
>>>> set to: /root/.ivy2/cache The jars for the packages stored in:
>>>> /root/.ivy2/jars :: loading settings :: url =
>>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>> org.apache.commons#commons-csv;1.1 in central found
>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>> resolve 277ms :: artifacts dl 7ms :: modules in use:
>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>> --------------------------------------------------------------------- | |
>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>> number|dwnlded|
>>>> --------------------------------------------------------------------- |
>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>> --------------------------------------------------------------------- ::
>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>>> REST application submission protocol. SLF4J: Class path contains multiple
>>>> SLF4J bindings. SLF4J: Found binding in
>>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in
>>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>>> explanation. SLF4J: Actual binding is of type
>>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>>> org.apache.commons#commons-csv;1.1 in central found
>>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>>> resolve 66ms :: artifacts dl 5ms :: modules in use:
>>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>>> --------------------------------------------------------------------- | |
>>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>>> number|dwnlded|
>>>> --------------------------------------------------------------------- |
>>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>>> --------------------------------------------------------------------- ::
>>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>>> artifacts copied, 3 already retrieved (0kB/4ms) at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143)
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73)
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:258)
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423)
>>>> at
>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:106)
>>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at
>>>> org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
>>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>>
>>>>
>>>>
>>>> In the Spark jobs I see a
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
>>>> running, and the stderr logs complains about missing log4j.properties:
>>>>
>>>> Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar:/usr/local/spark/conf/:/usr/local/spark/jars/*:/opt/hadoop-2.7.3/etc/hadoop:/opt/hadoop-2.7.3/etc/hadoop/*:/opt/hadoop-2.7.3/share/hadoop/common/lib/*:/opt/hadoop-2.7.3/share/hadoop/common/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.3/share/hadoop/yarn/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/*:/opt/hadoop-2.7.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.jars=file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.driver.supervise=false" "-Dspark.driver.extraJavaOptions= -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties -Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "-Dspark.app.name=org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077" "-Dspark.driver.extraClassPath=::/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*::/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.rpc.askTimeout=10s" "-Dfile.encoding=UTF-8" "-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties" "-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.30.102.7:41417" "/usr/local/spark/work/driver-20170503115405-0036/zeppelin-spark_2.11-0.7.1.jar" "org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "46151"
>>>> ========================================
>>>>
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in [jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> log4j:ERROR Could not read configuration file from URL [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>> java.io.FileNotFoundException: /opt/zeppelin-0.7.1/conf/log4j.properties (No such file or directory)
>>>> 	at java.io.FileInputStream.open0(Native Method)
>>>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>>>> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>>>> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>>>> 	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>>>> 	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>>>> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>>>> 	at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
>>>> 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
>>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
>>>> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:85)
>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>>> 	at scala.Option.getOrElse(Option.scala:121)
>>>> 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2373)
>>>> 	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
>>>> 	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:42)
>>>> 	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>>>> log4j:ERROR Ignoring configuration file [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>>> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>>>> log4j:WARN Please initialize the log4j system properly.
>>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
>>>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls to: root
>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls to: root
>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls groups to:
>>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls groups to:
>>>> 17/05/03 11:54:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
>>>> 17/05/03 11:54:07 INFO Utils: Successfully started service 'Driver' on port 39770.
>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Connecting to worker spark://Worker@172.30.102.7:41417
>>>> 17/05/03 11:54:07 INFO TransportClientFactory: Successfully created connection to /172.30.102.7:41417 after 27 ms (0 ms spent in bootstraps)
>>>> 17/05/03 11:54:07 INFO WorkerWatcher: Successfully connected to spark://Worker@172.30.102.7:41417
>>>> 17/05/03 11:54:07 INFO RemoteInterpreterServer: Starting remote interpreter server on port 46151
>>>>
>>>>
>>>> The process never finishes, so I got to kill it...
>>>>
>>>> What's going on? Anything wrong with my configuration?
>>>>
>>>> Any help appreciated. I am struggling since a week.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>

Re: Running a notebook in a standalone cluster mode issues

Posted by moon soo Lee <mo...@apache.org>.

Hi,

Zeppelin does not support cluster mode deploy at the moment. Fortunately,
there will be a support for cluster mode, soon!
Please keep an eye on https://issues.apache.org/jira/browse/ZEPPELIN-2040.

Thanks,
moon

On Wed, May 3, 2017 at 11:00 AM Sofiane Cherchalli <so...@gmail.com>
wrote:

> Shall I configure a remote interpreter to my notebook to run on the worker?
>
> Mayday!
>
> On Wed, May 3, 2017 at 4:18 PM, Sofiane Cherchalli <so...@gmail.com>
> wrote:
>
>> What port does the remote interpreter use?
>>
>> On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <so...@gmail.com>
>> wrote:
>>
>>> Hi Moon and al,
>>>
>>> I have a standalone cluster with one master, one worker. I submit jobs
>>> through zeppelin. master, worker, and zeppelin run in a separate container.
>>>
>>> My zeppelin-env.sh:
>>>
>>> # spark home
>>> export SPARK_HOME=/usr/local/spark
>>>
>>> # set hadoop conf dir
>>> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
>>>
>>> # set options to pass spark-submit command
>>> export SPARK_SUBMIT_OPTIONS="--packages
>>> com.databricks:spark-csv_2.11:1.5.0 --deploy-mode cluster"
>>>
>>> # worker memory
>>> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g
>>> -Dspark.submit.deployMode=cluster"
>>>
>>> # master
>>> export MASTER="spark://<master>:7077"
>>>
>>> My notebook code is very simple. It read csv and write it again in
>>> directory /data previously created:
>>> %spark.pyspark
>>> def read_input(fin):
>>>     '''
>>>     Read input file from filesystem and return dataframe
>>>     '''
>>>     df = sqlContext.read.load(fin, format='com.databricks.spark.csv',
>>> mode='PERMISSIVE', header='false', inferSchema='true')
>>>     return df
>>>
>>> def write_output(df, fout):
>>>     '''
>>>     Write dataframe to filesystem
>>>     '''
>>>
>>> df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',',
>>> header='true').save(fout)
>>>
>>> data_in = '/data/01.csv'
>>> data_out = '/data/02.csv'
>>> df = read_input(data_in)
>>> newdf = del_columns(df)
>>> write_output(newdf, data_out)
>>>
>>>
>>> I used --deploy-mode to *cluster* so that the driver is run in the
>>> worker in order to read the CSV in the /data directory and not in zeppelin.
>>> When running the notebook it complains that
>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is
>>> missing:
>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache
>>> set to: /root/.ivy2/cache The jars for the packages stored in:
>>> /root/.ivy2/jars :: loading settings :: url =
>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>> org.apache.commons#commons-csv;1.1 in central found
>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>> resolve 310ms :: artifacts dl 6ms :: modules in use:
>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>> --------------------------------------------------------------------- | |
>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>> number|dwnlded|
>>> --------------------------------------------------------------------- |
>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>> --------------------------------------------------------------------- ::
>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>> REST application submission protocol. SLF4J: Class path contains multiple
>>> SLF4J bindings. SLF4J: Found binding in
>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>> explanation. SLF4J: Actual binding is of type
>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>> org.apache.commons#commons-csv;1.1 in central found
>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>> resolve 69ms :: artifacts dl 5ms :: modules in use:
>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>> --------------------------------------------------------------------- | |
>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>> number|dwnlded|
>>> --------------------------------------------------------------------- |
>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>> --------------------------------------------------------------------- ::
>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>> artifacts copied, 3 already retrieved (0kB/4ms)
>>> java.nio.file.NoSuchFileException:
>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar at
>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at
>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>
>>>
>>>
>>>
>>> So, what I did next is copy the /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar
>>> in the worker container and restarted the interpreter and run the notebook.
>>> It doesn't complain anymore about the zeppelin-spark_2.11-0.7.1.jar,
>>> but I got another exception in the notebook related to the
>>> RemoteInterpreterManagedProcess:
>>>
>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache
>>> set to: /root/.ivy2/cache The jars for the packages stored in:
>>> /root/.ivy2/jars :: loading settings :: url =
>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
>>> found com.databricks#spark-csv_2.11;1.5.0 in central found
>>> org.apache.commons#commons-csv;1.1 in central found
>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>> resolve 277ms :: artifacts dl 7ms :: modules in use:
>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>> --------------------------------------------------------------------- | |
>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>> number|dwnlded|
>>> --------------------------------------------------------------------- |
>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>> --------------------------------------------------------------------- ::
>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>>> REST application submission protocol. SLF4J: Class path contains multiple
>>> SLF4J bindings. SLF4J: Found binding in
>>> [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in
>>> [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>> explanation. SLF4J: Actual binding is of type
>>> [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint
>>> spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a
>>> REST server. Falling back to legacy submission gateway instead. Ivy Default
>>> Cache set to: /root/.ivy2/cache The jars for the packages stored in:
>>> /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency ::
>>> resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>>> org.apache.commons#commons-csv;1.1 in central found
>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>>> resolve 66ms :: artifacts dl 5ms :: modules in use:
>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>>> org.apache.commons#commons-csv;1.1 from central in [default]
>>> --------------------------------------------------------------------- | |
>>> modules || artifacts | | conf | number| search|dwnlded|evicted||
>>> number|dwnlded|
>>> --------------------------------------------------------------------- |
>>> default | 3 | 0 | 0 | 0 || 3 | 0 |
>>> --------------------------------------------------------------------- ::
>>> retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>>> artifacts copied, 3 already retrieved (0kB/4ms) at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterManagedProcess.start(RemoteInterpreterManagedProcess.java:143)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.reference(RemoteInterpreterProcess.java:73)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:258)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423)
>>> at
>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:106)
>>> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at
>>> org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>>
>>> In the Spark jobs I see a
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer running,
>>> and the stderr logs complains about missing log4j.properties:
>>>
>>> Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar:/usr/local/spark/conf/:/usr/local/spark/jars/*:/opt/hadoop-2.7.3/etc/hadoop:/opt/hadoop-2.7.3/etc/hadoop/*:/opt/hadoop-2.7.3/share/hadoop/common/lib/*:/opt/hadoop-2.7.3/share/hadoop/common/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.3/share/hadoop/yarn/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/*:/opt/hadoop-2.7.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.jars=file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.driver.supervise=false" "-Dspark.driver.extraJavaOptions= -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties -Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "-Dspark.app.name=org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077" "-Dspark.driver.extraClassPath=::/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*::/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.rpc.askTimeout=10s" "-Dfile.encoding=UTF-8" "-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties" "-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.30.102.7:41417" "/usr/local/spark/work/driver-20170503115405-0036/zeppelin-spark_2.11-0.7.1.jar" "org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "46151"
>>> ========================================
>>>
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in [jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> log4j:ERROR Could not read configuration file from URL [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>> java.io.FileNotFoundException: /opt/zeppelin-0.7.1/conf/log4j.properties (No such file or directory)
>>> 	at java.io.FileInputStream.open0(Native Method)
>>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>>> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>>> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>>> 	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>>> 	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>>> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>>> 	at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
>>> 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
>>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
>>> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:85)
>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>>> 	at scala.Option.getOrElse(Option.scala:121)
>>> 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2373)
>>> 	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
>>> 	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:42)
>>> 	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>>> log4j:ERROR Ignoring configuration file [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>>> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>>> log4j:WARN Please initialize the log4j system properly.
>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
>>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls to: root
>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls to: root
>>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls groups to:
>>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls groups to:
>>> 17/05/03 11:54:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
>>> 17/05/03 11:54:07 INFO Utils: Successfully started service 'Driver' on port 39770.
>>> 17/05/03 11:54:07 INFO WorkerWatcher: Connecting to worker spark://Worker@172.30.102.7:41417
>>> 17/05/03 11:54:07 INFO TransportClientFactory: Successfully created connection to /172.30.102.7:41417 after 27 ms (0 ms spent in bootstraps)
>>> 17/05/03 11:54:07 INFO WorkerWatcher: Successfully connected to spark://Worker@172.30.102.7:41417
>>> 17/05/03 11:54:07 INFO RemoteInterpreterServer: Starting remote interpreter server on port 46151
>>>
>>>
>>> The process never finishes, so I got to kill it...
>>>
>>> What's going on? Anything wrong with my configuration?
>>>
>>> Any help appreciated. I am struggling since a week.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Running a notebook in a standalone cluster mode issues

Posted by Sofiane Cherchalli <so...@gmail.com>.

Shall I configure a remote interpreter to my notebook to run on the worker?

Mayday!

On Wed, May 3, 2017 at 4:18 PM, Sofiane Cherchalli <so...@gmail.com>
wrote:

> What port does the remote interpreter use?
>
> On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <so...@gmail.com>
> wrote:
>
>> Hi Moon and al,
>>
>> I have a standalone cluster with one master, one worker. I submit jobs
>> through zeppelin. master, worker, and zeppelin run in a separate container.
>>
>> My zeppelin-env.sh:
>>
>> # spark home
>> export SPARK_HOME=/usr/local/spark
>>
>> # set hadoop conf dir
>> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
>>
>> # set options to pass spark-submit command
>> export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.11:1.5.0
>> --deploy-mode cluster"
>>
>> # worker memory
>> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g
>> -Dspark.submit.deployMode=cluster"
>>
>> # master
>> export MASTER="spark://<master>:7077"
>>
>> My notebook code is very simple. It read csv and write it again in
>> directory /data previously created:
>> %spark.pyspark
>> def read_input(fin):
>>     '''
>>     Read input file from filesystem and return dataframe
>>     '''
>>     df = sqlContext.read.load(fin, format='com.databricks.spark.csv',
>> mode='PERMISSIVE', header='false', inferSchema='true')
>>     return df
>>
>> def write_output(df, fout):
>>     '''
>>     Write dataframe to filesystem
>>     '''
>>     df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',',
>> header='true').save(fout)
>>
>> data_in = '/data/01.csv'
>> data_out = '/data/02.csv'
>> df = read_input(data_in)
>> newdf = del_columns(df)
>> write_output(newdf, data_out)
>>
>>
>> I used --deploy-mode to *cluster* so that the driver is run in the
>> worker in order to read the CSV in the /data directory and not in zeppelin.
>> When running the notebook it complains that /opt/zeppelin-0.7.1/inter
>> preter/spark/zeppelin-spark_2.11-0.7.1.jar is missing:
>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache
>> set to: /root/.ivy2/cache The jars for the packages stored in:
>> /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-2.1.0/jars
>> /ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>> org.apache.commons#commons-csv;1.1 in central found
>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>> resolve 310ms :: artifacts dl 6ms :: modules in use:
>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>> org.apache.commons#commons-csv;1.1 from central in [default]
>> --------------------------------------------------------------------- |
>> | modules || artifacts | | conf | number| search|dwnlded|evicted||
>> number|dwnlded| ------------------------------
>> --------------------------------------- | default | 3 | 0 | 0 | 0 || 3 |
>> 0 | ---------------------------------------------------------------------
>> :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>> REST application submission protocol. SLF4J: Class path contains multiple
>> SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/
>> interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/
>> lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/sh
>> are/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> Warning: Master endpoint spark://spark-drone-master-sof
>> iane.autoetl.svc.cluster.local:7077 was not a REST server. Falling back
>> to legacy submission gateway instead. Ivy Default Cache set to:
>> /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars
>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>> org.apache.commons#commons-csv;1.1 in central found
>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>> resolve 69ms :: artifacts dl 5ms :: modules in use:
>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>> org.apache.commons#commons-csv;1.1 from central in [default]
>> --------------------------------------------------------------------- |
>> | modules || artifacts | | conf | number| search|dwnlded|evicted||
>> number|dwnlded| ------------------------------
>> --------------------------------------- | default | 3 | 0 | 0 | 0 || 3 |
>> 0 | ---------------------------------------------------------------------
>> :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>> artifacts copied, 3 already retrieved (0kB/4ms)
>> java.nio.file.NoSuchFileException: /opt/zeppelin-0.7.1/interprete
>> r/spark/zeppelin-spark_2.11-0.7.1.jar at sun.nio.fs.UnixException.trans
>> lateToIOException(UnixException.java:86) at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>
>>
>>
>>
>> So, what I did next is copy the /opt/zeppelin-0.7.1/interp
>> reter/spark/zeppelin-spark_2.11-0.7.1.jar in the worker container and
>> restarted the interpreter and run the notebook. It doesn't complain anymore
>> about the zeppelin-spark_2.11-0.7.1.jar, but I got another exception in
>> the notebook related to the RemoteInterpreterManagedProcess:
>>
>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache
>> set to: /root/.ivy2/cache The jars for the packages stored in:
>> /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-2.1.0/jars
>> /ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>> org.apache.commons#commons-csv;1.1 in central found
>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>> resolve 277ms :: artifacts dl 7ms :: modules in use:
>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>> org.apache.commons#commons-csv;1.1 from central in [default]
>> --------------------------------------------------------------------- |
>> | modules || artifacts | | conf | number| search|dwnlded|evicted||
>> number|dwnlded| ------------------------------
>> --------------------------------------- | default | 3 | 0 | 0 | 0 || 3 |
>> 0 | ---------------------------------------------------------------------
>> :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
>> REST application submission protocol. SLF4J: Class path contains multiple
>> SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/
>> interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/
>> lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/sh
>> are/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> Warning: Master endpoint spark://spark-drone-master-sof
>> iane.autoetl.svc.cluster.local:7077 was not a REST server. Falling back
>> to legacy submission gateway instead. Ivy Default Cache set to:
>> /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars
>> com.databricks#spark-csv_2.11 added as a dependency :: resolving
>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs:
>> [default] found com.databricks#spark-csv_2.11;1.5.0 in central found
>> org.apache.commons#commons-csv;1.1 in central found
>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
>> resolve 66ms :: artifacts dl 5ms :: modules in use:
>> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
>> com.univocity#univocity-parsers;1.5.1 from central in [default]
>> org.apache.commons#commons-csv;1.1 from central in [default]
>> --------------------------------------------------------------------- |
>> | modules || artifacts | | conf | number| search|dwnlded|evicted||
>> number|dwnlded| ------------------------------
>> --------------------------------------- | default | 3 | 0 | 0 | 0 || 3 |
>> 0 | ---------------------------------------------------------------------
>> :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
>> artifacts copied, 3 already retrieved (0kB/4ms) at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterMana
>> gedProcess.start(RemoteInterpreterManagedProcess.java:143) at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterProc
>> ess.reference(RemoteInterpreterProcess.java:73) at
>> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.
>> open(RemoteInterpreter.java:258) at org.apache.zeppelin.interprete
>> r.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423) at
>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormT
>> ype(LazyOpenInterpreter.java:106) at org.apache.zeppelin.notebook.P
>> aragraph.jobRun(Paragraph.java:387) at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
>> at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.access$201(ScheduledThreadPoolExecutor.java:180) at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFu
>> tureTask.run(ScheduledThreadPoolExecutor.java:293) at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> In the Spark jobs I see a org.apache.zeppelin.interprete
>> r.remote.RemoteInterpreterServer running, and the stderr logs complains
>> about missing log4j.properties:
>>
>> Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar:/usr/local/spark/conf/:/usr/local/spark/jars/*:/opt/hadoop-2.7.3/etc/hadoop:/opt/hadoop-2.7.3/etc/hadoop/*:/opt/hadoop-2.7.3/share/hadoop/common/lib/*:/opt/hadoop-2.7.3/share/hadoop/common/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.3/share/hadoop/yarn/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/*:/opt/hadoop-2.7.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.jars=file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.driver.supervise=false" "-Dspark.driver.extraJavaOptions= -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties -Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "-Dspark.app.name=org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077" "-Dspark.driver.extraClassPath=::/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*::/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.rpc.askTimeout=10s" "-Dfile.encoding=UTF-8" "-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties" "-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.30.102.7:41417" "/usr/local/spark/work/driver-20170503115405-0036/zeppelin-spark_2.11-0.7.1.jar" "org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "46151"
>> ========================================
>>
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in [jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> log4j:ERROR Could not read configuration file from URL [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>> java.io.FileNotFoundException: /opt/zeppelin-0.7.1/conf/log4j.properties (No such file or directory)
>> 	at java.io.FileInputStream.open0(Native Method)
>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
>> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
>> 	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
>> 	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
>> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
>> 	at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
>> 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
>> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
>> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
>> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:85)
>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
>> 	at scala.Option.getOrElse(Option.scala:121)
>> 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2373)
>> 	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
>> 	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:42)
>> 	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
>> log4j:ERROR Ignoring configuration file [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
>> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>> log4j:WARN Please initialize the log4j system properly.
>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls to: root
>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls to: root
>> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls groups to:
>> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls groups to:
>> 17/05/03 11:54:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
>> 17/05/03 11:54:07 INFO Utils: Successfully started service 'Driver' on port 39770.
>> 17/05/03 11:54:07 INFO WorkerWatcher: Connecting to worker spark://Worker@172.30.102.7:41417
>> 17/05/03 11:54:07 INFO TransportClientFactory: Successfully created connection to /172.30.102.7:41417 after 27 ms (0 ms spent in bootstraps)
>> 17/05/03 11:54:07 INFO WorkerWatcher: Successfully connected to spark://Worker@172.30.102.7:41417
>> 17/05/03 11:54:07 INFO RemoteInterpreterServer: Starting remote interpreter server on port 46151
>>
>>
>> The process never finishes, so I got to kill it...
>>
>> What's going on? Anything wrong with my configuration?
>>
>> Any help appreciated. I am struggling since a week.
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Running a notebook in a standalone cluster mode issues

Posted by Sofiane Cherchalli <so...@gmail.com>.

What port does the remote interpreter use?

On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <so...@gmail.com>
wrote:

> Hi Moon and al,
>
> I have a standalone cluster with one master, one worker. I submit jobs
> through zeppelin. master, worker, and zeppelin run in a separate container.
>
> My zeppelin-env.sh:
>
> # spark home
> export SPARK_HOME=/usr/local/spark
>
> # set hadoop conf dir
> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
>
> # set options to pass spark-submit command
> export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.11:1.5.0
> --deploy-mode cluster"
>
> # worker memory
> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g
> -Dspark.submit.deployMode=cluster"
>
> # master
> export MASTER="spark://<master>:7077"
>
> My notebook code is very simple. It read csv and write it again in
> directory /data previously created:
> %spark.pyspark
> def read_input(fin):
>     '''
>     Read input file from filesystem and return dataframe
>     '''
>     df = sqlContext.read.load(fin, format='com.databricks.spark.csv',
> mode='PERMISSIVE', header='false', inferSchema='true')
>     return df
>
> def write_output(df, fout):
>     '''
>     Write dataframe to filesystem
>     '''
>     df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',',
> header='true').save(fout)
>
> data_in = '/data/01.csv'
> data_out = '/data/02.csv'
> df = read_input(data_in)
> newdf = del_columns(df)
> write_output(newdf, data_out)
>
>
> I used --deploy-mode to *cluster* so that the driver is run in the worker
> in order to read the CSV in the /data directory and not in zeppelin. When
> running the notebook it complains that /opt/zeppelin-0.7.1/
> interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is missing:
> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache
> set to: /root/.ivy2/cache The jars for the packages stored in:
> /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-2.1.0/
> jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
> com.databricks#spark-csv_2.11 added as a dependency :: resolving
> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
> found com.databricks#spark-csv_2.11;1.5.0 in central found
> org.apache.commons#commons-csv;1.1 in central found
> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
> resolve 310ms :: artifacts dl 6ms :: modules in use:
> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
> com.univocity#univocity-parsers;1.5.1 from central in [default]
> org.apache.commons#commons-csv;1.1 from central in [default]
> --------------------------------------------------------------------- | |
> modules || artifacts | | conf | number| search|dwnlded|evicted||
> number|dwnlded| ------------------------------
> --------------------------------------- | default | 3 | 0 | 0 | 0 || 3 |
> 0 | ---------------------------------------------------------------------
> :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
> REST application submission protocol. SLF4J: Class path contains multiple
> SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/
> interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/
> slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in
> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-
> 1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding
> in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-
> log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See
> http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Warning: Master endpoint spark://spark-drone-master-
> sofiane.autoetl.svc.cluster.local:7077 was not a REST server. Falling
> back to legacy submission gateway instead. Ivy Default Cache set to:
> /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars
> com.databricks#spark-csv_2.11 added as a dependency :: resolving
> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
> found com.databricks#spark-csv_2.11;1.5.0 in central found
> org.apache.commons#commons-csv;1.1 in central found
> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
> resolve 69ms :: artifacts dl 5ms :: modules in use:
> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
> com.univocity#univocity-parsers;1.5.1 from central in [default]
> org.apache.commons#commons-csv;1.1 from central in [default]
> --------------------------------------------------------------------- | |
> modules || artifacts | | conf | number| search|dwnlded|evicted||
> number|dwnlded| ------------------------------
> --------------------------------------- | default | 3 | 0 | 0 | 0 || 3 |
> 0 | ---------------------------------------------------------------------
> :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
> artifacts copied, 3 already retrieved (0kB/4ms) java.nio.file.NoSuchFileException:
> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>
>
>
>
> So, what I did next is copy the /opt/zeppelin-0.7.1/
> interpreter/spark/zeppelin-spark_2.11-0.7.1.jar in the worker container
> and restarted the interpreter and run the notebook. It doesn't complain
> anymore about the zeppelin-spark_2.11-0.7.1.jar, but I got another
> exception in the notebook related to the RemoteInterpreterManagedProcess:
>
> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache
> set to: /root/.ivy2/cache The jars for the packages stored in:
> /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-2.1.0/
> jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
> com.databricks#spark-csv_2.11 added as a dependency :: resolving
> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
> found com.databricks#spark-csv_2.11;1.5.0 in central found
> org.apache.commons#commons-csv;1.1 in central found
> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
> resolve 277ms :: artifacts dl 7ms :: modules in use:
> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
> com.univocity#univocity-parsers;1.5.1 from central in [default]
> org.apache.commons#commons-csv;1.1 from central in [default]
> --------------------------------------------------------------------- | |
> modules || artifacts | | conf | number| search|dwnlded|evicted||
> number|dwnlded| ------------------------------
> --------------------------------------- | default | 3 | 0 | 0 | 0 || 3 |
> 0 | ---------------------------------------------------------------------
> :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
> artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the
> REST application submission protocol. SLF4J: Class path contains multiple
> SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/
> interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/
> slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in
> [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-
> 1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding
> in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-
> log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See
> http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> Warning: Master endpoint spark://spark-drone-master-
> sofiane.autoetl.svc.cluster.local:7077 was not a REST server. Falling
> back to legacy submission gateway instead. Ivy Default Cache set to:
> /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars
> com.databricks#spark-csv_2.11 added as a dependency :: resolving
> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default]
> found com.databricks#spark-csv_2.11;1.5.0 in central found
> org.apache.commons#commons-csv;1.1 in central found
> com.univocity#univocity-parsers;1.5.1 in central :: resolution report ::
> resolve 66ms :: artifacts dl 5ms :: modules in use:
> com.databricks#spark-csv_2.11;1.5.0 from central in [default]
> com.univocity#univocity-parsers;1.5.1 from central in [default]
> org.apache.commons#commons-csv;1.1 from central in [default]
> --------------------------------------------------------------------- | |
> modules || artifacts | | conf | number| search|dwnlded|evicted||
> number|dwnlded| ------------------------------
> --------------------------------------- | default | 3 | 0 | 0 | 0 || 3 |
> 0 | ---------------------------------------------------------------------
> :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0
> artifacts copied, 3 already retrieved (0kB/4ms) at org.apache.zeppelin.
> interpreter.remote.RemoteInterpreterManagedProcess.start(
> RemoteInterpreterManagedProcess.java:143) at org.apache.zeppelin.
> interpreter.remote.RemoteInterpreterProcess.reference(
> RemoteInterpreterProcess.java:73) at org.apache.zeppelin.
> interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:258) at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:423)
> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:106)
> at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387) at
> org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
> ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.
> ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> In the Spark jobs I see a org.apache.zeppelin.interpreter.remote.
> RemoteInterpreterServer running, and the stderr logs complains about
> missing log4j.properties:
>
> Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar:/usr/local/spark/conf/:/usr/local/spark/jars/*:/opt/hadoop-2.7.3/etc/hadoop:/opt/hadoop-2.7.3/etc/hadoop/*:/opt/hadoop-2.7.3/share/hadoop/common/lib/*:/opt/hadoop-2.7.3/share/hadoop/common/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.3/share/hadoop/hdfs/*:/opt/hadoop-2.7.3/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.3/share/hadoop/yarn/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.3/share/hadoop/mapreduce/*:/opt/hadoop-2.7.3/share/hadoop/tools/lib/*" "-Xmx1024M" "-Dspark.jars=file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/root/.ivy2/jars/com.databricks_spark-csv_2.11-1.5.0.jar,file:/root/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/root/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar,file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.driver.supervise=false" "-Dspark.driver.extraJavaOptions= -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties -Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "-Dspark.app.name=org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "-Dspark.submit.deployMode=cluster" "-Dspark.master=spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077" "-Dspark.driver.extraClassPath=::/opt/zeppelin-0.7.1/interpreter/spark/*:/opt/zeppelin-0.7.1/lib/interpreter/*::/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar" "-Dspark.rpc.askTimeout=10s" "-Dfile.encoding=UTF-8" "-Dlog4j.configuration=file:///opt/zeppelin-0.7.1/conf/log4j.properties" "-Dzeppelin.log.file=/opt/zeppelin-0.7.1/logs/zeppelin-interpreter-spark--zeppelin-sofiane-1-zyfya.log" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.30.102.7:41417" "/usr/local/spark/work/driver-20170503115405-0036/zeppelin-spark_2.11-0.7.1.jar" "org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer" "46151"
> ========================================
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/local/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> log4j:ERROR Could not read configuration file from URL [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
> java.io.FileNotFoundException: /opt/zeppelin-0.7.1/conf/log4j.properties (No such file or directory)
> 	at java.io.FileInputStream.open0(Native Method)
> 	at java.io.FileInputStream.open(FileInputStream.java:195)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
> 	at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
> 	at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
> 	at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
> 	at org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:64)
> 	at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:285)
> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155)
> 	at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132)
> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
> 	at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:85)
> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
> 	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2373)
> 	at scala.Option.getOrElse(Option.scala:121)
> 	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2373)
> 	at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:221)
> 	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:42)
> 	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
> log4j:ERROR Ignoring configuration file [file:/opt/zeppelin-0.7.1/conf/log4j.properties].
> log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls to: root
> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls to: root
> 17/05/03 11:54:06 INFO SecurityManager: Changing view acls groups to:
> 17/05/03 11:54:06 INFO SecurityManager: Changing modify acls groups to:
> 17/05/03 11:54:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
> 17/05/03 11:54:07 INFO Utils: Successfully started service 'Driver' on port 39770.
> 17/05/03 11:54:07 INFO WorkerWatcher: Connecting to worker spark://Worker@172.30.102.7:41417
> 17/05/03 11:54:07 INFO TransportClientFactory: Successfully created connection to /172.30.102.7:41417 after 27 ms (0 ms spent in bootstraps)
> 17/05/03 11:54:07 INFO WorkerWatcher: Successfully connected to spark://Worker@172.30.102.7:41417
> 17/05/03 11:54:07 INFO RemoteInterpreterServer: Starting remote interpreter server on port 46151
>
>
> The process never finishes, so I got to kill it...
>
> What's going on? Anything wrong with my configuration?
>
> Any help appreciated. I am struggling since a week.
>
>
>
>
>
>
>
>