You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Beth Lee <be...@gmail.com> on 2017/02/24 06:44:56 UTC
How can I use pyspark in zeppelin?
I installed spark-2.1.0-bin-hadoop2.7.tgz and zeppelin-0.7.0-bin-all.tgz in
ubuntu.
I set the zeppelin-env.sh like below.
export PYTHONPATH=/usr/bin/python
export PYSPARK_PYTHON=/home/jin/spark/python
So I try to use pyspark in the zeppelin notebook.
%spark.pyspark
print(2+2)
There are errors occurred in zeppelin notebook.
java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:380)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:369)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:144)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:817)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:546)
at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:206)
at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:160)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:482)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I don't know why these errors are occurred.
Would you give me some advice?
Re: How can I use pyspark in zeppelin?
Posted by Hyung Sung Shim <hs...@nflabs.com>.
hi.
I tested your problem in the same as your environment.
I think you don't need to set PYSPARK_PYTHON so can you remove export
PYSPARK_PYTHON=/home/jin/spark/python in your zeppelin-env.sh?
and retry please.
let me share my configuration fyi.
export PYTHONPATH=/usr/bin/python
export SPARK_HOME=/usr/lib/spark-2.1.0-bin-hadoop2.7
2017-02-24 20:03 GMT+09:00 Beth Lee <be...@gmail.com>:
> Yes, I already registered it.
> But the result is same.
>
> Thanks,
> Jin
>
> 2017-02-24 18:05 GMT+09:00 Hyung Sung Shim <hs...@nflabs.com>:
>
>> hello.
>> Could you set spark-2.1.0-bin-hadoop2.7 path as SPARK_HOME ?
>> You can refer to http://zeppelin.apache.org/
>> docs/0.7.0/interpreter/spark.html#1-export-spark_home.
>>
>> 2017-02-24 15:44 GMT+09:00 Beth Lee <be...@gmail.com>:
>>
>>> I installed spark-2.1.0-bin-hadoop2.7.tgz and zeppelin-0.7.0-bin-all.tgz in
>>> ubuntu.
>>>
>>> I set the zeppelin-env.sh like below.
>>>
>>> export PYTHONPATH=/usr/bin/python
>>> export PYSPARK_PYTHON=/home/jin/spark/python
>>>
>>> So I try to use pyspark in the zeppelin notebook.
>>>
>>> %spark.pyspark
>>>
>>> print(2+2)
>>>
>>> There are errors occurred in zeppelin notebook.
>>>
>>> java.lang.NullPointerException
>>> at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
>>> at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
>>> at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:380)
>>> at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:369)
>>> at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:144)
>>> at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:817)
>>> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>>> at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:546)
>>> at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:206)
>>> at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:160)
>>> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>>> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:482)
>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
>>> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> I don't know why these errors are occurred.
>>>
>>> Would you give me some advice?
>>>
>>
>>
>
Re: How can I use pyspark in zeppelin?
Posted by Beth Lee <be...@gmail.com>.
Yes, I already registered it.
But the result is same.
Thanks,
Jin
2017-02-24 18:05 GMT+09:00 Hyung Sung Shim <hs...@nflabs.com>:
> hello.
> Could you set spark-2.1.0-bin-hadoop2.7 path as SPARK_HOME ?
> You can refer to http://zeppelin.apache.org/docs/0.7.0/interpreter/spark.
> html#1-export-spark_home.
>
> 2017-02-24 15:44 GMT+09:00 Beth Lee <be...@gmail.com>:
>
>> I installed spark-2.1.0-bin-hadoop2.7.tgz and zeppelin-0.7.0-bin-all.tgz in
>> ubuntu.
>>
>> I set the zeppelin-env.sh like below.
>>
>> export PYTHONPATH=/usr/bin/python
>> export PYSPARK_PYTHON=/home/jin/spark/python
>>
>> So I try to use pyspark in the zeppelin notebook.
>>
>> %spark.pyspark
>>
>> print(2+2)
>>
>> There are errors occurred in zeppelin notebook.
>>
>> java.lang.NullPointerException
>> at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
>> at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
>> at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:380)
>> at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:369)
>> at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:144)
>> at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:817)
>> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>> at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:546)
>> at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:206)
>> at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:160)
>> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
>> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:482)
>> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
>> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> I don't know why these errors are occurred.
>>
>> Would you give me some advice?
>>
>
>
Re: How can I use pyspark in zeppelin?
Posted by Hyung Sung Shim <hs...@nflabs.com>.
hello.
Could you set spark-2.1.0-bin-hadoop2.7 path as SPARK_HOME ?
You can refer to
http://zeppelin.apache.org/docs/0.7.0/interpreter/spark.html#1-export-spark_home
.
2017-02-24 15:44 GMT+09:00 Beth Lee <be...@gmail.com>:
> I installed spark-2.1.0-bin-hadoop2.7.tgz and zeppelin-0.7.0-bin-all.tgz in
> ubuntu.
>
> I set the zeppelin-env.sh like below.
>
> export PYTHONPATH=/usr/bin/python
> export PYSPARK_PYTHON=/home/jin/spark/python
>
> So I try to use pyspark in the zeppelin notebook.
>
> %spark.pyspark
>
> print(2+2)
>
> There are errors occurred in zeppelin notebook.
>
> java.lang.NullPointerException
> at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
> at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
> at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:380)
> at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:369)
> at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:144)
> at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:817)
> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:546)
> at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:206)
> at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:160)
> at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:482)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> I don't know why these errors are occurred.
>
> Would you give me some advice?
>