You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Ian Maloney <ra...@gmail.com> on 2016/02/12 17:04:51 UTC

Can't get Pyspark(1.4.1) interpreter to work on Zeppelin(0.6)

Hi,

I've been trying unsuccessfully to configure the pyspark interpreter on
Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter
from Zeppelin without issue. Here are the lines which aren't commented out
in my zeppelin-env.sh file:

export MASTER=yarn-client

export ZEPPELIN_PORT=8090

export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950
-Dspark.yarn.queue=default"

export SPARK_HOME=/usr/hdp/current/spark-client/

export HADOOP_CONF_DIR=/etc/hadoop/conf

export PYSPARK_PYTHON=/usr/bin/python

export
PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH

Running a simple pyspark script in the interpreter gives this error:

  1.  Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.runJob.
  2.  : org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in
stage 1.0 (TID 5, some_yarn_node.networkname):
org.apache.spark.SparkException:
  3.  Error from python worker:
  4.    /usr/bin/python: No module named pyspark
  5.  PYTHONPATH was:
  6.
/app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar

More details can be found here:
https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html

Thanks,

Ian

Re: Can't get Pyspark(1.4.1) interpreter to work on Zeppelin(0.6)

Posted by mina lee <mi...@apache.org>.
Thank you for verifying. Glad to help :)

On Tue, Feb 23, 2016 at 3:51 AM Ian Maloney <ra...@gmail.com>
wrote:

> Hi Mina,
>
> I added your changes and they got the pyspark interpreter working! Thanks
> so much for your help!
>
> Ian
>
>
> On Sunday, February 21, 2016, mina lee <mi...@apache.org> wrote:
>
>> Hi Ian, sorry for late reply.
>> I was able to reproduce the same error with spark 1.4.1 & hadoop
>> 2.6.0. Turned out it was bug from Zeppelin.
>> After some search, I realized that `spark.yarn.isPython` property is
>> introduced since 1.5.0. I just made a PR(
>> https://github.com/apache/incubator-zeppelin/pull/736) to fix it. It
>> will be really appreciated if you can try it and see if it works. Thank you
>> for reporting bug!
>>
>> Regard,
>> Mina
>>
>> On Thu, Feb 18, 2016 at 2:39 AM, Ian Maloney <
>> rachmaninovquartet@gmail.com> wrote:
>>
>>> Hi Mina,
>>>
>>> Thanks for the response. I recloned the master from github and built
>>> using:
>>> mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark
>>>
>>> I did that locally then scped to a node in a cluster running HDP 2.3
>>> (spark 1.4.1 & hadoop 2.7.1).
>>>
>>> I added the two config files from below and started the Zeppelin daemon.
>>> Inspecting the spark.yarn.isPython config in the spark UI, showed it to be
>>> "true".
>>>
>>> The pyspark interpreter gives the same error as before. Are there any
>>> other configs I should check? I'm beginning to wonder if it's related to
>>> something in Hortonworks' distribution of spark or yarn.
>>>
>>>
>>>
>>> On Tuesday, February 16, 2016, mina lee <mi...@apache.org> wrote:
>>>
>>>> Hi Ian,
>>>>
>>>> The log stack looks quite similar with
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed
>>>> since v0.5.6
>>>> This happens when pyspark.zip and py4j-*.zip are not distributed to
>>>> yarn worker nodes.
>>>>
>>>> If you are building from source code can you please double check that
>>>> you pulled the latest master?
>>>> And also to be sure can you confirm that if you can see
>>>> spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI)
>>>> > Environment > Spark Properties?
>>>>
>>>> On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney <
>>>> rachmaninovquartet@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've been trying unsuccessfully to configure the pyspark interpreter
>>>>> on Zeppelin. I can use pyspark from the CLI and can use the Spark
>>>>> interpreter from Zeppelin without issue. Here are the lines which aren't
>>>>> commented out in my zeppelin-env.sh file:
>>>>>
>>>>> export MASTER=yarn-client
>>>>>
>>>>> export ZEPPELIN_PORT=8090
>>>>>
>>>>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950
>>>>> -Dspark.yarn.queue=default"
>>>>>
>>>>> export SPARK_HOME=/usr/hdp/current/spark-client/
>>>>>
>>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>>>>
>>>>> export PYSPARK_PYTHON=/usr/bin/python
>>>>>
>>>>> export
>>>>> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH
>>>>>
>>>>> Running a simple pyspark script in the interpreter gives this error:
>>>>>
>>>>>   1.  Py4JJavaError: An error occurred while calling
>>>>> z:org.apache.spark.api.python.PythonRDD.runJob.
>>>>>   2.  : org.apache.spark.SparkException: Job aborted due to stage
>>>>> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task
>>>>> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname):
>>>>> org.apache.spark.SparkException:
>>>>>   3.  Error from python worker:
>>>>>   4.    /usr/bin/python: No module named pyspark
>>>>>   5.  PYTHONPATH was:
>>>>>   6.
>>>>> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
>>>>>
>>>>> More details can be found here:
>>>>>
>>>>> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ian
>>>>>
>>>>>
>>>>
>>

Re: Can't get Pyspark(1.4.1) interpreter to work on Zeppelin(0.6)

Posted by Ian Maloney <ra...@gmail.com>.
Hi Mina,

I added your changes and they got the pyspark interpreter working! Thanks
so much for your help!

Ian

On Sunday, February 21, 2016, mina lee <mi...@apache.org> wrote:

> Hi Ian, sorry for late reply.
> I was able to reproduce the same error with spark 1.4.1 & hadoop
> 2.6.0. Turned out it was bug from Zeppelin.
> After some search, I realized that `spark.yarn.isPython` property is
> introduced since 1.5.0. I just made a PR(
> https://github.com/apache/incubator-zeppelin/pull/736) to fix it. It will
> be really appreciated if you can try it and see if it works. Thank you for
> reporting bug!
>
> Regard,
> Mina
>
> On Thu, Feb 18, 2016 at 2:39 AM, Ian Maloney <rachmaninovquartet@gmail.com
> <javascript:_e(%7B%7D,'cvml','rachmaninovquartet@gmail.com');>> wrote:
>
>> Hi Mina,
>>
>> Thanks for the response. I recloned the master from github and built
>> using:
>> mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark
>>
>> I did that locally then scped to a node in a cluster running HDP 2.3
>> (spark 1.4.1 & hadoop 2.7.1).
>>
>> I added the two config files from below and started the Zeppelin daemon.
>> Inspecting the spark.yarn.isPython config in the spark UI, showed it to be
>> "true".
>>
>> The pyspark interpreter gives the same error as before. Are there any
>> other configs I should check? I'm beginning to wonder if it's related to
>> something in Hortonworks' distribution of spark or yarn.
>>
>>
>>
>> On Tuesday, February 16, 2016, mina lee <minalee@apache.org
>> <javascript:_e(%7B%7D,'cvml','minalee@apache.org');>> wrote:
>>
>>> Hi Ian,
>>>
>>> The log stack looks quite similar with
>>> https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed
>>> since v0.5.6
>>> This happens when pyspark.zip and py4j-*.zip are not distributed to yarn
>>> worker nodes.
>>>
>>> If you are building from source code can you please double check that
>>> you pulled the latest master?
>>> And also to be sure can you confirm that if you can see
>>> spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI)
>>> > Environment > Spark Properties?
>>>
>>> On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney <
>>> rachmaninovquartet@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've been trying unsuccessfully to configure the pyspark interpreter on
>>>> Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter
>>>> from Zeppelin without issue. Here are the lines which aren't commented out
>>>> in my zeppelin-env.sh file:
>>>>
>>>> export MASTER=yarn-client
>>>>
>>>> export ZEPPELIN_PORT=8090
>>>>
>>>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950
>>>> -Dspark.yarn.queue=default"
>>>>
>>>> export SPARK_HOME=/usr/hdp/current/spark-client/
>>>>
>>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>>>
>>>> export PYSPARK_PYTHON=/usr/bin/python
>>>>
>>>> export
>>>> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH
>>>>
>>>> Running a simple pyspark script in the interpreter gives this error:
>>>>
>>>>   1.  Py4JJavaError: An error occurred while calling
>>>> z:org.apache.spark.api.python.PythonRDD.runJob.
>>>>   2.  : org.apache.spark.SparkException: Job aborted due to stage
>>>> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task
>>>> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname):
>>>> org.apache.spark.SparkException:
>>>>   3.  Error from python worker:
>>>>   4.    /usr/bin/python: No module named pyspark
>>>>   5.  PYTHONPATH was:
>>>>   6.
>>>> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
>>>>
>>>> More details can be found here:
>>>>
>>>> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html
>>>>
>>>> Thanks,
>>>>
>>>> Ian
>>>>
>>>>
>>>
>

Re: Can't get Pyspark(1.4.1) interpreter to work on Zeppelin(0.6)

Posted by mina lee <mi...@apache.org>.
Hi Ian, sorry for late reply.
I was able to reproduce the same error with spark 1.4.1 & hadoop
2.6.0. Turned out it was bug from Zeppelin.
After some search, I realized that `spark.yarn.isPython` property is
introduced since 1.5.0. I just made a PR(
https://github.com/apache/incubator-zeppelin/pull/736) to fix it. It will
be really appreciated if you can try it and see if it works. Thank you for
reporting bug!

Regard,
Mina

On Thu, Feb 18, 2016 at 2:39 AM, Ian Maloney <ra...@gmail.com>
wrote:

> Hi Mina,
>
> Thanks for the response. I recloned the master from github and built using:
> mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark
>
> I did that locally then scped to a node in a cluster running HDP 2.3
> (spark 1.4.1 & hadoop 2.7.1).
>
> I added the two config files from below and started the Zeppelin daemon.
> Inspecting the spark.yarn.isPython config in the spark UI, showed it to be
> "true".
>
> The pyspark interpreter gives the same error as before. Are there any
> other configs I should check? I'm beginning to wonder if it's related to
> something in Hortonworks' distribution of spark or yarn.
>
>
>
> On Tuesday, February 16, 2016, mina lee <mi...@apache.org> wrote:
>
>> Hi Ian,
>>
>> The log stack looks quite similar with
>> https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed since
>> v0.5.6
>> This happens when pyspark.zip and py4j-*.zip are not distributed to yarn
>> worker nodes.
>>
>> If you are building from source code can you please double check that you
>> pulled the latest master?
>> And also to be sure can you confirm that if you can see
>> spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI)
>> > Environment > Spark Properties?
>>
>> On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney <
>> rachmaninovquartet@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've been trying unsuccessfully to configure the pyspark interpreter on
>>> Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter
>>> from Zeppelin without issue. Here are the lines which aren't commented out
>>> in my zeppelin-env.sh file:
>>>
>>> export MASTER=yarn-client
>>>
>>> export ZEPPELIN_PORT=8090
>>>
>>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950
>>> -Dspark.yarn.queue=default"
>>>
>>> export SPARK_HOME=/usr/hdp/current/spark-client/
>>>
>>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>>
>>> export PYSPARK_PYTHON=/usr/bin/python
>>>
>>> export
>>> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH
>>>
>>> Running a simple pyspark script in the interpreter gives this error:
>>>
>>>   1.  Py4JJavaError: An error occurred while calling
>>> z:org.apache.spark.api.python.PythonRDD.runJob.
>>>   2.  : org.apache.spark.SparkException: Job aborted due to stage
>>> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task
>>> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname):
>>> org.apache.spark.SparkException:
>>>   3.  Error from python worker:
>>>   4.    /usr/bin/python: No module named pyspark
>>>   5.  PYTHONPATH was:
>>>   6.
>>> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
>>>
>>> More details can be found here:
>>>
>>> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html
>>>
>>> Thanks,
>>>
>>> Ian
>>>
>>>
>>

Re: Can't get Pyspark(1.4.1) interpreter to work on Zeppelin(0.6)

Posted by Ian Maloney <ra...@gmail.com>.
Hi Mina,

Thanks for the response. I recloned the master from github and built using:
mvn clean package -DskipTests -Pspark-1.4 -Phadoop-2.6 -Pyarn -Ppyspark

I did that locally then scped to a node in a cluster running HDP 2.3 (spark
1.4.1 & hadoop 2.7.1).

I added the two config files from below and started the Zeppelin daemon.
Inspecting the spark.yarn.isPython config in the spark UI, showed it to be
"true".

The pyspark interpreter gives the same error as before. Are there any other
configs I should check? I'm beginning to wonder if it's related to
something in Hortonworks' distribution of spark or yarn.


On Tuesday, February 16, 2016, mina lee <mi...@apache.org> wrote:

> Hi Ian,
>
> The log stack looks quite similar with
> https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed since
> v0.5.6
> This happens when pyspark.zip and py4j-*.zip are not distributed to yarn
> worker nodes.
>
> If you are building from source code can you please double check that you
> pulled the latest master?
> And also to be sure can you confirm that if you can see
> spark.yarn.isPython set to be true in Spark UI(Yarn's ApplicationMaster UI)
> > Environment > Spark Properties?
>
> On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney <rachmaninovquartet@gmail.com
> <javascript:_e(%7B%7D,'cvml','rachmaninovquartet@gmail.com');>> wrote:
>
>> Hi,
>>
>> I've been trying unsuccessfully to configure the pyspark interpreter on
>> Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter
>> from Zeppelin without issue. Here are the lines which aren't commented out
>> in my zeppelin-env.sh file:
>>
>> export MASTER=yarn-client
>>
>> export ZEPPELIN_PORT=8090
>>
>> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950
>> -Dspark.yarn.queue=default"
>>
>> export SPARK_HOME=/usr/hdp/current/spark-client/
>>
>> export HADOOP_CONF_DIR=/etc/hadoop/conf
>>
>> export PYSPARK_PYTHON=/usr/bin/python
>>
>> export
>> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH
>>
>> Running a simple pyspark script in the interpreter gives this error:
>>
>>   1.  Py4JJavaError: An error occurred while calling
>> z:org.apache.spark.api.python.PythonRDD.runJob.
>>   2.  : org.apache.spark.SparkException: Job aborted due to stage
>> failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task
>> 0.3 in stage 1.0 (TID 5, some_yarn_node.networkname):
>> org.apache.spark.SparkException:
>>   3.  Error from python worker:
>>   4.    /usr/bin/python: No module named pyspark
>>   5.  PYTHONPATH was:
>>   6.
>> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
>>
>> More details can be found here:
>>
>> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html
>>
>> Thanks,
>>
>> Ian
>>
>>
>

Re: Can't get Pyspark(1.4.1) interpreter to work on Zeppelin(0.6)

Posted by mina lee <mi...@apache.org>.
Hi Ian,

The log stack looks quite similar with
https://issues.apache.org/jira/browse/ZEPPELIN-572 which has fixed since
v0.5.6
This happens when pyspark.zip and py4j-*.zip are not distributed to yarn
worker nodes.

If you are building from source code can you please double check that you
pulled the latest master?
And also to be sure can you confirm that if you can see spark.yarn.isPython
set to be true in Spark UI(Yarn's ApplicationMaster UI) > Environment >
Spark Properties?

On Sat, Feb 13, 2016 at 1:04 AM, Ian Maloney <ra...@gmail.com>
wrote:

> Hi,
>
> I've been trying unsuccessfully to configure the pyspark interpreter on
> Zeppelin. I can use pyspark from the CLI and can use the Spark interpreter
> from Zeppelin without issue. Here are the lines which aren't commented out
> in my zeppelin-env.sh file:
>
> export MASTER=yarn-client
>
> export ZEPPELIN_PORT=8090
>
> export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.3.2.0-2950
> -Dspark.yarn.queue=default"
>
> export SPARK_HOME=/usr/hdp/current/spark-client/
>
> export HADOOP_CONF_DIR=/etc/hadoop/conf
>
> export PYSPARK_PYTHON=/usr/bin/python
>
> export
> PYTHONPATH=${SPARK_HOME}/python:${SPARK_HOME}/python/build:$PYTHONPATH
>
> Running a simple pyspark script in the interpreter gives this error:
>
>   1.  Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.runJob.
>   2.  : org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in
> stage 1.0 (TID 5, some_yarn_node.networkname):
> org.apache.spark.SparkException:
>   3.  Error from python worker:
>   4.    /usr/bin/python: No module named pyspark
>   5.  PYTHONPATH was:
>   6.
> /app/hadoop/yarn/local/usercache/my_username/filecache/4121/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar
>
> More details can be found here:
>
> https://community.hortonworks.com/questions/16436/cants-get-pyspark-interpreter-to-work-on-zeppelin.html
>
> Thanks,
>
> Ian
>
>