You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ian Stokes Rees <ij...@continuum.io> on 2016/09/02 02:56:58 UTC

PySpark: preference for Python 2.7 or Python 3.5?

I have the option of running PySpark with Python 2.7 or Python 3.5. I am 
fairly expert with Python and know the Python-side history of the 
differences.  All else being the same, I have a preference for Python 
3.5.  I'm using CDH 5.8 and I'm wondering if that biases whether I 
should proceed with PySpark on top of Python 2.7 or 3.5. Opinions?  Does 
Cloudera have an official (or unofficial) position on this?

Thanks,

Ian
_______________________________
Ian Stokes-Rees
Computational Scientist

Continuum Analytics <http://continuum.io>
@ijstokes Twitter <http://twitter.com/ijstokes> LinkedIn 
<http://linkedin.com/in/ijstokes> Github 
<http://github.com/ijstokes>617.942.0218

Re: PySpark: preference for Python 2.7 or Python 3.5?

Posted by Ian Stokes Rees <ij...@continuum.io>.
On 9/2/16 3:47 AM, Felix Cheung wrote:
> There is an Anaconda parcel one could readily install on CDH
>
> https://docs.continuum.io/anaconda/cloudera
>
> As Sean says it is Python 2.7.x.
>
> Spark should work for both 2.7 and 3.5.

Yes, I'm actually an engineer at Continuum, so I know the Anaconda 
parcel pretty well.  It is more a question of whether CDH and Spark 
"work" better with PySpark on Python 2.7 or Python 3.5.  My sense was 
"you choose: both are fine", but I wanted to ask here before committing 
to going down one path or another.

Thanks,

Ian

Re: PySpark: preference for Python 2.7 or Python 3.5?

Posted by Felix Cheung <fe...@hotmail.com>.
There is an Anaconda parcel one could readily install on CDH

https://docs.continuum.io/anaconda/cloudera

As Sean says it is Python 2.7.x.

Spark should work for both 2.7 and 3.5.

_____________________________
From: Sean Owen <so...@cloudera.com>>
Sent: Friday, September 2, 2016 12:41 AM
Subject: Re: PySpark: preference for Python 2.7 or Python 3.5?
To: Ian Stokes Rees <ij...@continuum.io>>
Cc: user @spark <us...@spark.apache.org>>


Spark should work fine with Python 3. I'm not a Python person, but all else equal I'd use 3.5 too. I assume the issue could be libraries you want that don't support Python 3. I don't think that changes with CDH. It includes a version of Anaconda from Continuum, but that lays down Python 2.7.11. I don't believe there's any particular position on 2 vs 3.

On Fri, Sep 2, 2016 at 3:56 AM, Ian Stokes Rees <ij...@continuum.io>> wrote:
I have the option of running PySpark with Python 2.7 or Python 3.5.  I am fairly expert with Python and know the Python-side history of the differences.  All else being the same, I have a preference for Python 3.5.  I'm using CDH 5.8 and I'm wondering if that biases whether I should proceed with PySpark on top of Python 2.7 or 3.5.  Opinions?  Does Cloudera have an official (or unofficial) position on this?

Thanks,

Ian
_______________________________
Ian Stokes-Rees
Computational Scientist

[Continuum Analytics]<http://continuum.io>
@ijstokes [Twitter] <http://twitter.com/ijstokes>  [LinkedIn] <http://linkedin.com/in/ijstokes>  [Github] <http://github.com/ijstokes>  617.942.0218




Re: PySpark: preference for Python 2.7 or Python 3.5?

Posted by Sean Owen <so...@cloudera.com>.
Spark should work fine with Python 3. I'm not a Python person, but all else
equal I'd use 3.5 too. I assume the issue could be libraries you want that
don't support Python 3. I don't think that changes with CDH. It includes a
version of Anaconda from Continuum, but that lays down Python 2.7.11. I
don't believe there's any particular position on 2 vs 3.

On Fri, Sep 2, 2016 at 3:56 AM, Ian Stokes Rees <ij...@continuum.io>
wrote:

> I have the option of running PySpark with Python 2.7 or Python 3.5.  I am
> fairly expert with Python and know the Python-side history of the
> differences.  All else being the same, I have a preference for Python 3.5.
> I'm using CDH 5.8 and I'm wondering if that biases whether I should proceed
> with PySpark on top of Python 2.7 or 3.5.  Opinions?  Does Cloudera have an
> official (or unofficial) position on this?
>
> Thanks,
>
> Ian
> _______________________________
> Ian Stokes-Rees
> Computational Scientist
>
> [image: Continuum Analytics] <http://continuum.io>
> @ijstokes [image: Twitter] <http://twitter.com/ijstokes> [image: LinkedIn]
> <http://linkedin.com/in/ijstokes> [image: Github]
> <http://github.com/ijstokes> 617.942.0218
>