You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Manuel Sopena Ballesteros <ma...@garvan.org.au> on 2018/06/08 01:26:04 UTC

how to load pandas into pyspark (centos 6 with python 2.6)

Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The system I am using is centos 6 with python 2.6 so I can't install pandas system wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
T: + 61 (0)2 9355 5760 | F: +61 (0)2 9295 8507 | E: manuel.sb@garvan.org.au<ma...@garvan.org.au>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: how to load pandas into pyspark (centos 6 with python 2.6)

Posted by Jeff Zhang <zj...@gmail.com>.

The error message is clear, it is due to the folder permission.  Try to do
that via user root.



Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月12日周二 上午7:42写道：

> Ok, this is what I am getting
>
>
>
> $/tmp/pythonvenv/bin/pip install pandas
>
>
>
> The directory '/home/zeppelin/.cache/pip/http' or its parent directory is
> not owned by the current user and the cache has been disabled. Please check
> the permissions and owner of that directory. If executing pip with sudo,
> you may want sudo's -H flag.
>
> pip is configured with locations that require TLS/SSL, however the ssl
> module in Python is not available.
>
> The directory '/home/zeppelin/.cache/pip' or its parent directory is not
> owned by the current user and caching wheels has been disabled. check the
> permissions and owner of that directory. If executing pip with sudo, you
> may want sudo's -H flag.
>
> Collecting pandas
>
>   Retrying (Retry(total=4, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Retrying (Retry(total=3, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Retrying (Retry(total=2, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Retrying (Retry(total=1, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Retrying (Retry(total=0, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'SSLError("Can't connect to HTTPS
> URL because the SSL module is not available.",)': /simple/pandas/
>
>   Could not find a version that satisfies the requirement pandas (from
> versions: )
>
> No matching distribution found for pandas
>
>   Could not fetch URL https://pypi.python.org/simple/pandas/: There was a
> problem confirming the ssl certificate: HTTPSConnectionPool(host='
> pypi.python.org', port=443): Max retries exceeded with url:
> /simple/pandas/ (Caused by SSLError("Can't connect to HTTPS URL because the
> SSL module is not available.",)) - skipping
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, June 8, 2018 2:54 PM
>
>
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> Just find pip in your python 3.6 folder, and run pip using full path. e.g.
>
>
>
> /tmp/Python-3.6.5/pip install pandas
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月8日周五 下午12:47写道：
>
> Sorry for the stupid question
>
>
>
> How can I use pip? Zeppelin will run pip through the shell interpreter but
> my system global python is 2.6…
>
>
>
>
>
>
>
> thanks
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, June 8, 2018 1:45 PM
>
>
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> pip should be available under your python3.6.5, you can use that to
> install pandas
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月8日周五 上午11:40写道：
>
> Hi Jeff,
>
>
>
> Thank you very much for your quick response. My zeppelin is deployed using
> HDP (hortonworks platform) so I already have spark/yarn integration and I
> am using zeppelin.pyspark.python to tell pyspark to run python 3.6:
>
>
>
> zeppelin.pyspark.python à /tmp/Python-3.6.5/python
>
>
>
> I do have root access to the machine but OS is centos 6 (python system
> environment is 2.6) hence pip is not available
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, June 8, 2018 11:47 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> First I would suggest you to use python 2.7 or python 3.x, because
> spark2.x has drop the support of python 2.6.
>
> Second you need to configure PYSPARK_PYTHON in spark interpreter setting
> to point to the python that you installed. (I don't know what do you mena
> that you can't install pandas system wide). Do you mean you are not root
> and don't have permission to install python packages ?
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月8日周五 上午9:26写道：
>
> Dear Zeppelin community,
>
>
>
> I am trying to load pandas into my zeppelin %spark2.pyspark interpreter.
> The system I am using is centos 6 with python 2.6 so I can’t install pandas
> system wide through pip as suggested in the documentation.
>
>
>
> What can I do if I want to add modules into the %spark2.pyspark
> interpreter?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Big data Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> <https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
> *T:* + 61 (0)2 9355 5760 <+61%202%209355%205760> | *F:* +61 (0)2 9295 8507
> <+61%202%209295%208507> | *E:* manuel.sb@garvan.org.au
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>

RE: how to load pandas into pyspark (centos 6 with python 2.6)

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.

Ok, this is what I am getting

$/tmp/pythonvenv/bin/pip install pandas

The directory '/home/zeppelin/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.
The directory '/home/zeppelin/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting pandas
  Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.",)': /simple/pandas/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.",)': /simple/pandas/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.",)': /simple/pandas/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.",)': /simple/pandas/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError("Can't connect to HTTPS URL because the SSL module is not available.",)': /simple/pandas/
  Could not find a version that satisfies the requirement pandas (from versions: )
No matching distribution found for pandas
  Could not fetch URL https://pypi.python.org/simple/pandas/: There was a problem confirming the ssl certificate: HTTPSConnectionPool(host='pypi.python.org', port=443): Max retries exceeded with url: /simple/pandas/ (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available.",)) - skipping

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, June 8, 2018 2:54 PM
To: users@zeppelin.apache.org
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


Just find pip in your python 3.6 folder, and run pip using full path. e.g.

/tmp/Python-3.6.5/pip install pandas

Manuel Sopena Ballesteros <ma...@garvan.org.au>>于2018年6月8日周五 下午12:47写道：
Sorry for the stupid question

How can I use pip? Zeppelin will run pip through the shell interpreter but my system global python is 2.6…


[cid:image002.jpg@01D3FF37.8827CBF0]

thanks

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: Friday, June 8, 2018 1:45 PM

To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


pip should be available under your python3.6.5, you can use that to install pandas


Manuel Sopena Ballesteros <ma...@garvan.org.au>>于2018年6月8日周五 上午11:40写道：
Hi Jeff,

Thank you very much for your quick response. My zeppelin is deployed using HDP (hortonworks platform) so I already have spark/yarn integration and I am using zeppelin.pyspark.python to tell pyspark to run python 3.6:

zeppelin.pyspark.python --> /tmp/Python-3.6.5/python

I do have root access to the machine but OS is centos 6 (python system environment is 2.6) hence pip is not available

Thank you

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: Friday, June 8, 2018 11:47 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


First I would suggest you to use python 2.7 or python 3.x, because spark2.x has drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to point to the python that you installed. (I don't know what do you mena that you can't install pandas system wide). Do you mean you are not root and don't have permission to install python packages ?



Manuel Sopena Ballesteros <ma...@garvan.org.au>>于2018年6月8日周五 上午9:26写道：
Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The system I am using is centos 6 with python 2.6 so I can’t install pandas system wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010<https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
T: + 61 (0)2 9355 5760<tel:+61%202%209355%205760> | F: +61 (0)2 9295 8507<tel:+61%202%209295%208507> | E: manuel.sb@garvan.org.au<ma...@garvan.org.au>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: how to load pandas into pyspark (centos 6 with python 2.6)

Posted by Jeff Zhang <zj...@gmail.com>.

Just find pip in your python 3.6 folder, and run pip using full path. e.g.

/tmp/Python-3.6.5/pip install pandas

Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月8日周五 下午12:47写道：

> Sorry for the stupid question
>
>
>
> How can I use pip? Zeppelin will run pip through the shell interpreter but
> my system global python is 2.6…
>
>
>
>
>
>
>
> thanks
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, June 8, 2018 1:45 PM
>
>
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> pip should be available under your python3.6.5, you can use that to
> install pandas
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月8日周五 上午11:40写道：
>
> Hi Jeff,
>
>
>
> Thank you very much for your quick response. My zeppelin is deployed using
> HDP (hortonworks platform) so I already have spark/yarn integration and I
> am using zeppelin.pyspark.python to tell pyspark to run python 3.6:
>
>
>
> zeppelin.pyspark.python à /tmp/Python-3.6.5/python
>
>
>
> I do have root access to the machine but OS is centos 6 (python system
> environment is 2.6) hence pip is not available
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, June 8, 2018 11:47 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> First I would suggest you to use python 2.7 or python 3.x, because
> spark2.x has drop the support of python 2.6.
>
> Second you need to configure PYSPARK_PYTHON in spark interpreter setting
> to point to the python that you installed. (I don't know what do you mena
> that you can't install pandas system wide). Do you mean you are not root
> and don't have permission to install python packages ?
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月8日周五 上午9:26写道：
>
> Dear Zeppelin community,
>
>
>
> I am trying to load pandas into my zeppelin %spark2.pyspark interpreter.
> The system I am using is centos 6 with python 2.6 so I can’t install pandas
> system wide through pip as suggested in the documentation.
>
>
>
> What can I do if I want to add modules into the %spark2.pyspark
> interpreter?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Big data Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> <https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
> *T:* + 61 (0)2 9355 5760 <+61%202%209355%205760> | *F:* +61 (0)2 9295 8507
> <+61%202%209295%208507> | *E:* manuel.sb@garvan.org.au
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>

RE: how to load pandas into pyspark (centos 6 with python 2.6)

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.

Sorry for the stupid question

How can I use pip? Zeppelin will run pip through the shell interpreter but my system global python is 2.6…


[cid:image002.jpg@01D3FF37.8827CBF0]

thanks

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, June 8, 2018 1:45 PM
To: users@zeppelin.apache.org
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


pip should be available under your python3.6.5, you can use that to install pandas


Manuel Sopena Ballesteros <ma...@garvan.org.au>>于2018年6月8日周五 上午11:40写道：
Hi Jeff,

Thank you very much for your quick response. My zeppelin is deployed using HDP (hortonworks platform) so I already have spark/yarn integration and I am using zeppelin.pyspark.python to tell pyspark to run python 3.6:

zeppelin.pyspark.python --> /tmp/Python-3.6.5/python

I do have root access to the machine but OS is centos 6 (python system environment is 2.6) hence pip is not available

Thank you

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: Friday, June 8, 2018 11:47 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


First I would suggest you to use python 2.7 or python 3.x, because spark2.x has drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to point to the python that you installed. (I don't know what do you mena that you can't install pandas system wide). Do you mean you are not root and don't have permission to install python packages ?



Manuel Sopena Ballesteros <ma...@garvan.org.au>>于2018年6月8日周五 上午9:26写道：
Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The system I am using is centos 6 with python 2.6 so I can’t install pandas system wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010<https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
T: + 61 (0)2 9355 5760<tel:+61%202%209355%205760> | F: +61 (0)2 9295 8507<tel:+61%202%209295%208507> | E: manuel.sb@garvan.org.au<ma...@garvan.org.au>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: how to load pandas into pyspark (centos 6 with python 2.6)

Posted by Jeff Zhang <zj...@gmail.com>.

pip should be available under your python3.6.5, you can use that to install
pandas


Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月8日周五 上午11:40写道：

> Hi Jeff,
>
>
>
> Thank you very much for your quick response. My zeppelin is deployed using
> HDP (hortonworks platform) so I already have spark/yarn integration and I
> am using zeppelin.pyspark.python to tell pyspark to run python 3.6:
>
>
>
> zeppelin.pyspark.python à /tmp/Python-3.6.5/python
>
>
>
> I do have root access to the machine but OS is centos 6 (python system
> environment is 2.6) hence pip is not available
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, June 8, 2018 11:47 AM
> *To:* users@zeppelin.apache.org
> *Subject:* Re: how to load pandas into pyspark (centos 6 with python 2.6)
>
>
>
>
>
> First I would suggest you to use python 2.7 or python 3.x, because
> spark2.x has drop the support of python 2.6.
>
> Second you need to configure PYSPARK_PYTHON in spark interpreter setting
> to point to the python that you installed. (I don't know what do you mena
> that you can't install pandas system wide). Do you mean you are not root
> and don't have permission to install python packages ?
>
>
>
>
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月8日周五 上午9:26写道：
>
> Dear Zeppelin community,
>
>
>
> I am trying to load pandas into my zeppelin %spark2.pyspark interpreter.
> The system I am using is centos 6 with python 2.6 so I can’t install pandas
> system wide through pip as suggested in the documentation.
>
>
>
> What can I do if I want to add modules into the %spark2.pyspark
> interpreter?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Big data Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> <https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
> *T:* + 61 (0)2 9355 5760 <+61%202%209355%205760> | *F:* +61 (0)2 9295 8507
> <+61%202%209295%208507> | *E:* manuel.sb@garvan.org.au
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>

RE: how to load pandas into pyspark (centos 6 with python 2.6)

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.

Hi Jeff,

Thank you very much for your quick response. My zeppelin is deployed using HDP (hortonworks platform) so I already have spark/yarn integration and I am using zeppelin.pyspark.python to tell pyspark to run python 3.6:

zeppelin.pyspark.python --> /tmp/Python-3.6.5/python

I do have root access to the machine but OS is centos 6 (python system environment is 2.6) hence pip is not available

Thank you

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, June 8, 2018 11:47 AM
To: users@zeppelin.apache.org
Subject: Re: how to load pandas into pyspark (centos 6 with python 2.6)


First I would suggest you to use python 2.7 or python 3.x, because spark2.x has drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to point to the python that you installed. (I don't know what do you mena that you can't install pandas system wide). Do you mean you are not root and don't have permission to install python packages ?



Manuel Sopena Ballesteros <ma...@garvan.org.au>>于2018年6月8日周五 上午9:26写道：
Dear Zeppelin community,

I am trying to load pandas into my zeppelin %spark2.pyspark interpreter. The system I am using is centos 6 with python 2.6 so I can’t install pandas system wide through pip as suggested in the documentation.

What can I do if I want to add modules into the %spark2.pyspark interpreter?

Thank you very much

Manuel Sopena Ballesteros | Big data Engineer
Garvan Institute of Medical Research
The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010<https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
T: + 61 (0)2 9355 5760<tel:+61%202%209355%205760> | F: +61 (0)2 9295 8507<tel:+61%202%209295%208507> | E: manuel.sb@garvan.org.au<ma...@garvan.org.au>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: how to load pandas into pyspark (centos 6 with python 2.6)

Posted by Jeff Zhang <zj...@gmail.com>.

First I would suggest you to use python 2.7 or python 3.x, because spark2.x
has drop the support of python 2.6.
Second you need to configure PYSPARK_PYTHON in spark interpreter setting to
point to the python that you installed. (I don't know what do you mena that
you can't install pandas system wide). Do you mean you are not root and
don't have permission to install python packages ?



Manuel Sopena Ballesteros <ma...@garvan.org.au>于2018年6月8日周五 上午9:26写道：

> Dear Zeppelin community,
>
>
>
> I am trying to load pandas into my zeppelin %spark2.pyspark interpreter.
> The system I am using is centos 6 with python 2.6 so I can’t install pandas
> system wide through pip as suggested in the documentation.
>
>
>
> What can I do if I want to add modules into the %spark2.pyspark
> interpreter?
>
>
>
> Thank you very much
>
>
>
> *Manuel Sopena Ballesteros *| Big data Engineer
> *Garvan Institute of Medical Research *
> The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, NSW 2010
> <https://maps.google.com/?q=370+Victoria+Street,+Darlinghurst,+NSW+2010&entry=gmail&source=g>
> *T:* + 61 (0)2 9355 5760 <+61%202%209355%205760> | *F:* +61 (0)2 9295 8507
> <+61%202%209295%208507> | *E:* manuel.sb@garvan.org.au
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>