You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Ruslan Dautkhanov <da...@gmail.com> on 2018/10/26 18:25:58 UTC

switching between python2 and 3 for %pyspark

I'd like to give users ability to switch between Python2 and Python3 for
their PySpark jobs.
Was somebody able to set up something like this, so they can switch between
python2 and python3 pyspark interpreters?

For this experiment, created a new %py3spark interpreter, assigned to spark
interpreter group.

Added following options there for %py3spark: [1]
/opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's
available on all worker nodes and on zeppelin server too.

For default %pyspark interpreter it's very similar to [1], except all paths
have "/opt/cloudera/parcels/Anaconda" instead of "
/opt/cloudera/parcels/Anaconda3".

Nevertheless, zeppelin_ipythonxxx/ipython_server.py
seems catching environment variable from zeppelin-env.sh and not from
interpreter settings.

Zeppelin documentation reads that all uppercase variables will be
treated as environment variables, so I assume it should overwrite what's in
zeppelin-env.sh, no?

It seems environment variables at interpreter level are broken - notice
"pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH
(highlighted).

[image: image.png]



[1]

LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
PATH
/usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
PYTHONHOME  /opt/cloudera/parcels/Anaconda3

spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
spark.executorEnv.PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
spark.yarn.appMasterEnv.PYSPARK_PYTHON
/opt/cloudera/parcels/Anaconda3/bin/python

-- 
Ruslan Dautkhanov

Re: switching between python2 and 3 for %pyspark

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Jeff, I can't reproduce what you have..
It seems Zeppelin's spark.conf can only append environment variables that
are missing?
If for example a variable was already defined in zeppelin-env.sh, then
spark.conf would only
add new variables but not overwrite variables that were already defined in
zeppelin-env.sh
or somewhere else?

[image: image.png]


Thank you,
Ruslan Dautkhanov


On Fri, Oct 26, 2018 at 10:38 PM Jeff Zhang <zj...@gmail.com> wrote:

> Regarding your screenshot, I notice that %pyspark also specify
> PYSPARK_DRIVER_PYTHON to point to python3.  That's why you use python3 in
> %pyspark
>
> Not sure how you set these properties. Here's what I did, and it works for
> me.  %spark.pyspark use python3 and %spark2.pyspark use python2
>
>
> [image: image.png]
>
>
> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午11:55写道:
>
>> Jeff, nope we were not changing zeppelin source code.
>> We just used existing Zeppelin UI to add another interpreter in Spark
>> group.
>> We called it "py3spark" for Python 3 Spark.. it's the same thing, just
>> all the environment
>> and configuration variables were updated to point to the Python3 home on
>> both
>> driver and executors' side. Thanks.
>>
>> --
>> Ruslan Dautkhanov
>>
>>
>> On Fri, Oct 26, 2018 at 9:36 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> IIUC, You change the source code spark interpreter by adding another
>>> py3spark interpreter, is that right ?
>>>
>>> >>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>>> >>> seems catching environment variable from zeppelin-env.sh and not
>>> from interpreter settings.
>>> zeppelin will read env from both zeppelin-env and interpreter setting,
>>> and env in interpreter setting should be able to overwrite that defined in
>>> zeppelin-env. If not, then it is might be a bug.
>>>
>>> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午11:30写道:
>>>
>>>> Thanks Jeff , yep that's what we have too in [1]  - that's what we have
>>>> currently in interpreter settings now.
>>>> It doesn't work for some reason.
>>>> We're running Zeppelin from ~May'18 snapshot - has anything changed
>>>> since then?
>>>>
>>>>
>>>> Ruslan
>>>>
>>>>
>>>>
>>>>
>>>> [1]
>>>>
>>>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>>>> PATH
>>>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>>>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>>>
>>>> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
>>>> spark.executorEnv.PYSPARK_PYTHON
>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>> spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>
>>>>
>>>> --
>>>> Ruslan Dautkhanov
>>>>
>>>>
>>>> On Fri, Oct 26, 2018 at 9:10 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>>
>>>>> Hi Ruslan,
>>>>>
>>>>> I believe you can just set PYSPARK_PYTHON in spark interpreter
>>>>> setting to switch between python2 and python3
>>>>>
>>>>>
>>>>>
>>>>> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午2:26写道:
>>>>>
>>>>>> I'd like to give users ability to switch between Python2 and Python3
>>>>>> for their PySpark jobs.
>>>>>> Was somebody able to set up something like this, so they can switch
>>>>>> between python2 and python3 pyspark interpreters?
>>>>>>
>>>>>> For this experiment, created a new %py3spark interpreter, assigned to
>>>>>> spark interpreter group.
>>>>>>
>>>>>> Added following options there for %py3spark: [1]
>>>>>> /opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's
>>>>>> available on all worker nodes and on zeppelin server too.
>>>>>>
>>>>>> For default %pyspark interpreter it's very similar to [1], except all
>>>>>> paths have "/opt/cloudera/parcels/Anaconda" instead of "
>>>>>> /opt/cloudera/parcels/Anaconda3".
>>>>>>
>>>>>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>>>>>> seems catching environment variable from zeppelin-env.sh and not from
>>>>>> interpreter settings.
>>>>>>
>>>>>> Zeppelin documentation reads that all uppercase variables will be
>>>>>> treated as environment variables, so I assume it should overwrite
>>>>>> what's in zeppelin-env.sh, no?
>>>>>>
>>>>>> It seems environment variables at interpreter level are broken -
>>>>>> notice "pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH
>>>>>> (highlighted).
>>>>>>
>>>>>> [image: image.png]
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>>>>>> PATH
>>>>>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>>>>>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>>>>>
>>>>>> spark.executorEnv.LD_LIBRARY_PATH/
>>>>>> opt/cloudera/parcels/Anaconda3/lib
>>>>>> spark.executorEnv.PYSPARK_PYTHON
>>>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> spark.pyspark.driver.python
>>>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>>>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>>
>>>>>> --
>>>>>> Ruslan Dautkhanov
>>>>>>
>>>>>

Re: switching between python2 and 3 for %pyspark

Posted by Jeff Zhang <zj...@gmail.com>.
BTW, it is better to use PYSPARK_PYTHON instead of PYSPARK_DRIVER_PYTHON


Jeff Zhang <zj...@gmail.com>于2018年10月27日周六 下午12:38写道:

> Regarding your screenshot, I notice that %pyspark also specify
> PYSPARK_DRIVER_PYTHON to point to python3.  That's why you use python3 in
> %pyspark
>
> Not sure how you set these properties. Here's what I did, and it works for
> me.  %spark.pyspark use python3 and %spark2.pyspark use python2
>
>
> [image: image.png]
>
>
> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午11:55写道:
>
>> Jeff, nope we were not changing zeppelin source code.
>> We just used existing Zeppelin UI to add another interpreter in Spark
>> group.
>> We called it "py3spark" for Python 3 Spark.. it's the same thing, just
>> all the environment
>> and configuration variables were updated to point to the Python3 home on
>> both
>> driver and executors' side. Thanks.
>>
>> --
>> Ruslan Dautkhanov
>>
>>
>> On Fri, Oct 26, 2018 at 9:36 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> IIUC, You change the source code spark interpreter by adding another
>>> py3spark interpreter, is that right ?
>>>
>>> >>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>>> >>> seems catching environment variable from zeppelin-env.sh and not
>>> from interpreter settings.
>>> zeppelin will read env from both zeppelin-env and interpreter setting,
>>> and env in interpreter setting should be able to overwrite that defined in
>>> zeppelin-env. If not, then it is might be a bug.
>>>
>>> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午11:30写道:
>>>
>>>> Thanks Jeff , yep that's what we have too in [1]  - that's what we have
>>>> currently in interpreter settings now.
>>>> It doesn't work for some reason.
>>>> We're running Zeppelin from ~May'18 snapshot - has anything changed
>>>> since then?
>>>>
>>>>
>>>> Ruslan
>>>>
>>>>
>>>>
>>>>
>>>> [1]
>>>>
>>>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>>>> PATH
>>>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>>>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>>>
>>>> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
>>>> spark.executorEnv.PYSPARK_PYTHON
>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>> spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>
>>>>
>>>> --
>>>> Ruslan Dautkhanov
>>>>
>>>>
>>>> On Fri, Oct 26, 2018 at 9:10 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>>
>>>>> Hi Ruslan,
>>>>>
>>>>> I believe you can just set PYSPARK_PYTHON in spark interpreter
>>>>> setting to switch between python2 and python3
>>>>>
>>>>>
>>>>>
>>>>> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午2:26写道:
>>>>>
>>>>>> I'd like to give users ability to switch between Python2 and Python3
>>>>>> for their PySpark jobs.
>>>>>> Was somebody able to set up something like this, so they can switch
>>>>>> between python2 and python3 pyspark interpreters?
>>>>>>
>>>>>> For this experiment, created a new %py3spark interpreter, assigned to
>>>>>> spark interpreter group.
>>>>>>
>>>>>> Added following options there for %py3spark: [1]
>>>>>> /opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's
>>>>>> available on all worker nodes and on zeppelin server too.
>>>>>>
>>>>>> For default %pyspark interpreter it's very similar to [1], except all
>>>>>> paths have "/opt/cloudera/parcels/Anaconda" instead of "
>>>>>> /opt/cloudera/parcels/Anaconda3".
>>>>>>
>>>>>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>>>>>> seems catching environment variable from zeppelin-env.sh and not from
>>>>>> interpreter settings.
>>>>>>
>>>>>> Zeppelin documentation reads that all uppercase variables will be
>>>>>> treated as environment variables, so I assume it should overwrite
>>>>>> what's in zeppelin-env.sh, no?
>>>>>>
>>>>>> It seems environment variables at interpreter level are broken -
>>>>>> notice "pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH
>>>>>> (highlighted).
>>>>>>
>>>>>> [image: image.png]
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>>>>>> PATH
>>>>>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>>>>>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>>>>>
>>>>>> spark.executorEnv.LD_LIBRARY_PATH/
>>>>>> opt/cloudera/parcels/Anaconda3/lib
>>>>>> spark.executorEnv.PYSPARK_PYTHON
>>>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> spark.pyspark.driver.python
>>>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>>>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>>
>>>>>> --
>>>>>> Ruslan Dautkhanov
>>>>>>
>>>>>

Re: switching between python2 and 3 for %pyspark

Posted by Jeff Zhang <zj...@gmail.com>.
Regarding your screenshot, I notice that %pyspark also specify
PYSPARK_DRIVER_PYTHON to point to python3.  That's why you use python3 in
%pyspark

Not sure how you set these properties. Here's what I did, and it works for
me.  %spark.pyspark use python3 and %spark2.pyspark use python2


[image: image.png]


Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午11:55写道:

> Jeff, nope we were not changing zeppelin source code.
> We just used existing Zeppelin UI to add another interpreter in Spark
> group.
> We called it "py3spark" for Python 3 Spark.. it's the same thing, just all
> the environment
> and configuration variables were updated to point to the Python3 home on
> both
> driver and executors' side. Thanks.
>
> --
> Ruslan Dautkhanov
>
>
> On Fri, Oct 26, 2018 at 9:36 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>> IIUC, You change the source code spark interpreter by adding another
>> py3spark interpreter, is that right ?
>>
>> >>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>> >>> seems catching environment variable from zeppelin-env.sh and not from
>> interpreter settings.
>> zeppelin will read env from both zeppelin-env and interpreter setting,
>> and env in interpreter setting should be able to overwrite that defined in
>> zeppelin-env. If not, then it is might be a bug.
>>
>> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午11:30写道:
>>
>>> Thanks Jeff , yep that's what we have too in [1]  - that's what we have
>>> currently in interpreter settings now.
>>> It doesn't work for some reason.
>>> We're running Zeppelin from ~May'18 snapshot - has anything changed
>>> since then?
>>>
>>>
>>> Ruslan
>>>
>>>
>>>
>>>
>>> [1]
>>>
>>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>>> PATH
>>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>>
>>> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
>>> spark.executorEnv.PYSPARK_PYTHON
>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>> spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>>
>>> On Fri, Oct 26, 2018 at 9:10 PM Jeff Zhang <zj...@gmail.com> wrote:
>>>
>>>> Hi Ruslan,
>>>>
>>>> I believe you can just set PYSPARK_PYTHON in spark interpreter setting
>>>> to switch between python2 and python3
>>>>
>>>>
>>>>
>>>> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午2:26写道:
>>>>
>>>>> I'd like to give users ability to switch between Python2 and Python3
>>>>> for their PySpark jobs.
>>>>> Was somebody able to set up something like this, so they can switch
>>>>> between python2 and python3 pyspark interpreters?
>>>>>
>>>>> For this experiment, created a new %py3spark interpreter, assigned to
>>>>> spark interpreter group.
>>>>>
>>>>> Added following options there for %py3spark: [1]
>>>>> /opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's
>>>>> available on all worker nodes and on zeppelin server too.
>>>>>
>>>>> For default %pyspark interpreter it's very similar to [1], except all
>>>>> paths have "/opt/cloudera/parcels/Anaconda" instead of "
>>>>> /opt/cloudera/parcels/Anaconda3".
>>>>>
>>>>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>>>>> seems catching environment variable from zeppelin-env.sh and not from
>>>>> interpreter settings.
>>>>>
>>>>> Zeppelin documentation reads that all uppercase variables will be
>>>>> treated as environment variables, so I assume it should overwrite
>>>>> what's in zeppelin-env.sh, no?
>>>>>
>>>>> It seems environment variables at interpreter level are broken -
>>>>> notice "pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH
>>>>> (highlighted).
>>>>>
>>>>> [image: image.png]
>>>>>
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>>>>> PATH
>>>>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>>>>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>>>>
>>>>> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
>>>>> spark.executorEnv.PYSPARK_PYTHON
>>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>> spark.pyspark.driver.python
>>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>>>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>>
>>>>> --
>>>>> Ruslan Dautkhanov
>>>>>
>>>>

Re: switching between python2 and 3 for %pyspark

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Jeff, nope we were not changing zeppelin source code.
We just used existing Zeppelin UI to add another interpreter in Spark group.
We called it "py3spark" for Python 3 Spark.. it's the same thing, just all
the environment
and configuration variables were updated to point to the Python3 home on
both
driver and executors' side. Thanks.

-- 
Ruslan Dautkhanov


On Fri, Oct 26, 2018 at 9:36 PM Jeff Zhang <zj...@gmail.com> wrote:

> IIUC, You change the source code spark interpreter by adding another
> py3spark interpreter, is that right ?
>
> >>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
> >>> seems catching environment variable from zeppelin-env.sh and not from
> interpreter settings.
> zeppelin will read env from both zeppelin-env and interpreter setting, and
> env in interpreter setting should be able to overwrite that defined in
> zeppelin-env. If not, then it is might be a bug.
>
> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午11:30写道:
>
>> Thanks Jeff , yep that's what we have too in [1]  - that's what we have
>> currently in interpreter settings now.
>> It doesn't work for some reason.
>> We're running Zeppelin from ~May'18 snapshot - has anything changed since
>> then?
>>
>>
>> Ruslan
>>
>>
>>
>>
>> [1]
>>
>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>> PATH
>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>
>> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
>> spark.executorEnv.PYSPARK_PYTHON
>> /opt/cloudera/parcels/Anaconda3/bin/python
>> spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>> /opt/cloudera/parcels/Anaconda3/bin/python
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>>
>> On Fri, Oct 26, 2018 at 9:10 PM Jeff Zhang <zj...@gmail.com> wrote:
>>
>>> Hi Ruslan,
>>>
>>> I believe you can just set PYSPARK_PYTHON in spark interpreter setting
>>> to switch between python2 and python3
>>>
>>>
>>>
>>> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午2:26写道:
>>>
>>>> I'd like to give users ability to switch between Python2 and Python3
>>>> for their PySpark jobs.
>>>> Was somebody able to set up something like this, so they can switch
>>>> between python2 and python3 pyspark interpreters?
>>>>
>>>> For this experiment, created a new %py3spark interpreter, assigned to
>>>> spark interpreter group.
>>>>
>>>> Added following options there for %py3spark: [1]
>>>> /opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's
>>>> available on all worker nodes and on zeppelin server too.
>>>>
>>>> For default %pyspark interpreter it's very similar to [1], except all
>>>> paths have "/opt/cloudera/parcels/Anaconda" instead of "
>>>> /opt/cloudera/parcels/Anaconda3".
>>>>
>>>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>>>> seems catching environment variable from zeppelin-env.sh and not from
>>>> interpreter settings.
>>>>
>>>> Zeppelin documentation reads that all uppercase variables will be
>>>> treated as environment variables, so I assume it should overwrite
>>>> what's in zeppelin-env.sh, no?
>>>>
>>>> It seems environment variables at interpreter level are broken - notice
>>>> "pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH
>>>> (highlighted).
>>>>
>>>> [image: image.png]
>>>>
>>>>
>>>>
>>>> [1]
>>>>
>>>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>>>> PATH
>>>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>>>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>>>
>>>> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
>>>> spark.executorEnv.PYSPARK_PYTHON
>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>> spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>>
>>>> --
>>>> Ruslan Dautkhanov
>>>>
>>>

Re: switching between python2 and 3 for %pyspark

Posted by Jeff Zhang <zj...@gmail.com>.
IIUC, You change the source code spark interpreter by adding another
py3spark interpreter, is that right ?

>>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>>> seems catching environment variable from zeppelin-env.sh and not from
interpreter settings.
zeppelin will read env from both zeppelin-env and interpreter setting, and
env in interpreter setting should be able to overwrite that defined in
zeppelin-env. If not, then it is might be a bug.

Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午11:30写道:

> Thanks Jeff , yep that's what we have too in [1]  - that's what we have
> currently in interpreter settings now.
> It doesn't work for some reason.
> We're running Zeppelin from ~May'18 snapshot - has anything changed since
> then?
>
>
> Ruslan
>
>
>
>
> [1]
>
> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
> PATH
> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>
> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
> spark.executorEnv.PYSPARK_PYTHON
> /opt/cloudera/parcels/Anaconda3/bin/python
> spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
> spark.yarn.appMasterEnv.PYSPARK_PYTHON
> /opt/cloudera/parcels/Anaconda3/bin/python
>
>
> --
> Ruslan Dautkhanov
>
>
> On Fri, Oct 26, 2018 at 9:10 PM Jeff Zhang <zj...@gmail.com> wrote:
>
>> Hi Ruslan,
>>
>> I believe you can just set PYSPARK_PYTHON in spark interpreter setting
>> to switch between python2 and python3
>>
>>
>>
>> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午2:26写道:
>>
>>> I'd like to give users ability to switch between Python2 and Python3 for
>>> their PySpark jobs.
>>> Was somebody able to set up something like this, so they can switch
>>> between python2 and python3 pyspark interpreters?
>>>
>>> For this experiment, created a new %py3spark interpreter, assigned to
>>> spark interpreter group.
>>>
>>> Added following options there for %py3spark: [1]
>>> /opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's
>>> available on all worker nodes and on zeppelin server too.
>>>
>>> For default %pyspark interpreter it's very similar to [1], except all
>>> paths have "/opt/cloudera/parcels/Anaconda" instead of "
>>> /opt/cloudera/parcels/Anaconda3".
>>>
>>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>>> seems catching environment variable from zeppelin-env.sh and not from
>>> interpreter settings.
>>>
>>> Zeppelin documentation reads that all uppercase variables will be
>>> treated as environment variables, so I assume it should overwrite what's
>>> in zeppelin-env.sh, no?
>>>
>>> It seems environment variables at interpreter level are broken - notice
>>> "pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH
>>> (highlighted).
>>>
>>> [image: image.png]
>>>
>>>
>>>
>>> [1]
>>>
>>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>>> PATH
>>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>>
>>> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
>>> spark.executorEnv.PYSPARK_PYTHON
>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>> spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>>> /opt/cloudera/parcels/Anaconda3/bin/python
>>>
>>> --
>>> Ruslan Dautkhanov
>>>
>>

Re: switching between python2 and 3 for %pyspark

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Thanks Jeff , yep that's what we have too in [1]  - that's what we have
currently in interpreter settings now.
It doesn't work for some reason.
We're running Zeppelin from ~May'18 snapshot - has anything changed since
then?


Ruslan




[1]

LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
PATH
/usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
PYTHONHOME  /opt/cloudera/parcels/Anaconda3

spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
spark.executorEnv.PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
spark.yarn.appMasterEnv.PYSPARK_PYTHON
/opt/cloudera/parcels/Anaconda3/bin/python


-- 
Ruslan Dautkhanov


On Fri, Oct 26, 2018 at 9:10 PM Jeff Zhang <zj...@gmail.com> wrote:

> Hi Ruslan,
>
> I believe you can just set PYSPARK_PYTHON in spark interpreter setting to
> switch between python2 and python3
>
>
>
> Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午2:26写道:
>
>> I'd like to give users ability to switch between Python2 and Python3 for
>> their PySpark jobs.
>> Was somebody able to set up something like this, so they can switch
>> between python2 and python3 pyspark interpreters?
>>
>> For this experiment, created a new %py3spark interpreter, assigned to
>> spark interpreter group.
>>
>> Added following options there for %py3spark: [1]
>> /opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's
>> available on all worker nodes and on zeppelin server too.
>>
>> For default %pyspark interpreter it's very similar to [1], except all
>> paths have "/opt/cloudera/parcels/Anaconda" instead of "
>> /opt/cloudera/parcels/Anaconda3".
>>
>> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
>> seems catching environment variable from zeppelin-env.sh and not from
>> interpreter settings.
>>
>> Zeppelin documentation reads that all uppercase variables will be
>> treated as environment variables, so I assume it should overwrite what's
>> in zeppelin-env.sh, no?
>>
>> It seems environment variables at interpreter level are broken - notice
>> "pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH
>> (highlighted).
>>
>> [image: image.png]
>>
>>
>>
>> [1]
>>
>> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
>> PATH
>> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
>> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
>> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>>
>> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
>> spark.executorEnv.PYSPARK_PYTHON
>> /opt/cloudera/parcels/Anaconda3/bin/python
>> spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
>> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
>> spark.yarn.appMasterEnv.PYSPARK_PYTHON
>> /opt/cloudera/parcels/Anaconda3/bin/python
>>
>> --
>> Ruslan Dautkhanov
>>
>

Re: switching between python2 and 3 for %pyspark

Posted by Jeff Zhang <zj...@gmail.com>.
Hi Ruslan,

I believe you can just set PYSPARK_PYTHON in spark interpreter setting to
switch between python2 and python3



Ruslan Dautkhanov <da...@gmail.com>于2018年10月27日周六 上午2:26写道:

> I'd like to give users ability to switch between Python2 and Python3 for
> their PySpark jobs.
> Was somebody able to set up something like this, so they can switch
> between python2 and python3 pyspark interpreters?
>
> For this experiment, created a new %py3spark interpreter, assigned to
> spark interpreter group.
>
> Added following options there for %py3spark: [1]
> /opt/cloudera/parcels/Anaconda3 is our Anaconda python3 home that's
> available on all worker nodes and on zeppelin server too.
>
> For default %pyspark interpreter it's very similar to [1], except all
> paths have "/opt/cloudera/parcels/Anaconda" instead of "
> /opt/cloudera/parcels/Anaconda3".
>
> Nevertheless, zeppelin_ipythonxxx/ipython_server.py
> seems catching environment variable from zeppelin-env.sh and not from
> interpreter settings.
>
> Zeppelin documentation reads that all uppercase variables will be
> treated as environment variables, so I assume it should overwrite what's
> in zeppelin-env.sh, no?
>
> It seems environment variables at interpreter level are broken - notice
> "pyspark" paragraph has "Anaconda3" and not "Anaconda" in PATH
> (highlighted).
>
> [image: image.png]
>
>
>
> [1]
>
> LD_LIBRARY_PATH  /opt/cloudera/parcels/Anaconda3/lib
> PATH
> /usr/java/latest/bin:/opt/cloudera/parcels/Anaconda3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rdautkha/bin
> PYSPARK_DRIVER_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
> PYSPARK_PYTHON  /opt/cloudera/parcels/Anaconda3/bin/python
> PYTHONHOME  /opt/cloudera/parcels/Anaconda3
>
> spark.executorEnv.LD_LIBRARY_PATH/  opt/cloudera/parcels/Anaconda3/lib
> spark.executorEnv.PYSPARK_PYTHON
> /opt/cloudera/parcels/Anaconda3/bin/python
> spark.pyspark.driver.python  /opt/cloudera/parcels/Anaconda3/bin/python
> spark.pyspark.python  /opt/cloudera/parcels/Anaconda3/bin/python
> spark.yarn.appMasterEnv.PYSPARK_PYTHON
> /opt/cloudera/parcels/Anaconda3/bin/python
>
> --
> Ruslan Dautkhanov
>