You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2016/01/05 08:17:07 UTC

[discuss] dropping Python 2.6 support

Does anybody here care about us dropping support for Python 2.6 in Spark
2.0?

Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
parsing) when compared with Python 2.7. Some libraries that Spark depend on
stopped supporting 2.6. We can still convince the library maintainers to
support 2.6, but it will be extra work. I'm curious if anybody still uses
Python 2.6 to run Spark.

Thanks.

Re: [discuss] dropping Python 2.6 support

Posted by yash datta <sa...@gmail.com>.
+1

On Tue, Jan 5, 2016 at 1:57 PM, Jian Feng Zhang <jz...@gmail.com>
wrote:

> +1
>
> We use Python 2.7+ and 3.4+ to call PySpark.
>
> 2016-01-05 15:58 GMT+08:00 Kushal Datta <ku...@gmail.com>:
>
>> +1
>>
>> ----
>> Dr. Kushal Datta
>> Senior Research Scientist
>> Big Data Research & Pathfinding
>> Intel Corporation, USA.
>>
>> On Mon, Jan 4, 2016 at 11:52 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> wrote:
>>
>>> +1
>>>
>>> no problem for me to remove Python 2.6 in 2.0.
>>>
>>> Thanks
>>> Regards
>>> JB
>>>
>>>
>>> On 01/05/2016 08:17 AM, Reynold Xin wrote:
>>>
>>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>>> 2.0?
>>>>
>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend
>>>> on stopped supporting 2.6. We can still convince the library maintainers
>>>> to support 2.6, but it will be extra work. I'm curious if anybody still
>>>> uses Python 2.6 to run Spark.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
>>
>
>
> --
> Best,
> Jian Feng
>



-- 
When events unfold with calm and ease
When the winds that blow are merely breeze
Learn from nature, from birds and bees
Live your life in love, and let joy not cease.

Re: [discuss] dropping Python 2.6 support

Posted by Jian Feng Zhang <jz...@gmail.com>.
+1

We use Python 2.7+ and 3.4+ to call PySpark.

2016-01-05 15:58 GMT+08:00 Kushal Datta <ku...@gmail.com>:

> +1
>
> ----
> Dr. Kushal Datta
> Senior Research Scientist
> Big Data Research & Pathfinding
> Intel Corporation, USA.
>
> On Mon, Jan 4, 2016 at 11:52 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>
>> +1
>>
>> no problem for me to remove Python 2.6 in 2.0.
>>
>> Thanks
>> Regards
>> JB
>>
>>
>> On 01/05/2016 08:17 AM, Reynold Xin wrote:
>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend
>>> on stopped supporting 2.6. We can still convince the library maintainers
>>> to support 2.6, but it will be extra work. I'm curious if anybody still
>>> uses Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>


-- 
Best,
Jian Feng

Re: [discuss] dropping Python 2.6 support

Posted by Kushal Datta <ku...@gmail.com>.
+1

----
Dr. Kushal Datta
Senior Research Scientist
Big Data Research & Pathfinding
Intel Corporation, USA.

On Mon, Jan 4, 2016 at 11:52 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> +1
>
> no problem for me to remove Python 2.6 in 2.0.
>
> Thanks
> Regards
> JB
>
>
> On 01/05/2016 08:17 AM, Reynold Xin wrote:
>
>> Does anybody here care about us dropping support for Python 2.6 in Spark
>> 2.0?
>>
>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>> parsing) when compared with Python 2.7. Some libraries that Spark depend
>> on stopped supporting 2.6. We can still convince the library maintainers
>> to support 2.6, but it will be extra work. I'm curious if anybody still
>> uses Python 2.6 to run Spark.
>>
>> Thanks.
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [discuss] dropping Python 2.6 support

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
+1

no problem for me to remove Python 2.6 in 2.0.

Thanks
Regards
JB

On 01/05/2016 08:17 AM, Reynold Xin wrote:
> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend
> on stopped supporting 2.6. We can still convince the library maintainers
> to support 2.6, but it will be extra work. I'm curious if anybody still
> uses Python 2.6 to run Spark.
>
> Thanks.
>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Dmitry Kniazev <kn...@tut.by>.
Sasha, it is more complicated than that: many RHEL 6 OS utilities rely on Python 2.6. Upgrading it to 2.7 breaks the system. For large enterprises migrating to another server OS means re-certifying (re-testing) hundreds of applications, so yes, they do prefer to stay where they are until the benefits of migrating outweigh the overhead. Long story short: you cannot simply upgrade built-in Python 2.6 in RHEL 6 and it will take years for enterprises to migrate to RHEL 7.

Having said that, I don't think that it is a problem though, because Python 2.6 and Python 2.7 can easily co-exist in the same environment. For example, we use virtualenv to run Spark with Python 2.7 and do not touch system Python 2.6.

Thank you,
Dmitry

09.01.2016, 06:36, "Sasha Kacanski" <sk...@gmail.com>:
> +1
> Companies that use stock python in redhat 2.6 will need to upgrade or install fresh version wich is total of 3.5 minutes so no issues ...
>
> On Tue, Jan 5, 2016 at 2:17 AM, Reynold Xin <rx...@databricks.com> wrote:
>> Does anybody here care about us dropping support for Python 2.6 in Spark 2.0?
>>
>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json parsing) when compared with Python 2.7. Some libraries that Spark depend on stopped supporting 2.6. We can still convince the library maintainers to support 2.6, but it will be extra work. I'm curious if anybody still uses Python 2.6 to run Spark.
>>
>> Thanks.
>
> --
> Aleksandar Kacanski

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Dmitry Kniazev <kn...@tut.by>.
Sasha, it is more complicated than that: many RHEL 6 OS utilities rely on Python 2.6. Upgrading it to 2.7 breaks the system. For large enterprises migrating to another server OS means re-certifying (re-testing) hundreds of applications, so yes, they do prefer to stay where they are until the benefits of migrating outweigh the overhead. Long story short: you cannot simply upgrade built-in Python 2.6 in RHEL 6 and it will take years for enterprises to migrate to RHEL 7.

Having said that, I don't think that it is a problem though, because Python 2.6 and Python 2.7 can easily co-exist in the same environment. For example, we use virtualenv to run Spark with Python 2.7 and do not touch system Python 2.6.

Thank you,
Dmitry

09.01.2016, 06:36, "Sasha Kacanski" <sk...@gmail.com>:
> +1
> Companies that use stock python in redhat 2.6 will need to upgrade or install fresh version wich is total of 3.5 minutes so no issues ...
>
> On Tue, Jan 5, 2016 at 2:17 AM, Reynold Xin <rx...@databricks.com> wrote:
>> Does anybody here care about us dropping support for Python 2.6 in Spark 2.0?
>>
>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json parsing) when compared with Python 2.7. Some libraries that Spark depend on stopped supporting 2.6. We can still convince the library maintainers to support 2.6, but it will be extra work. I'm curious if anybody still uses Python 2.6 to run Spark.
>>
>> Thanks.
>
> --
> Aleksandar Kacanski

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Sasha Kacanski <sk...@gmail.com>.
+1
Companies that use stock python in redhat 2.6 will need to upgrade or
install fresh version wich is total of 3.5 minutes so no issues ...

On Tue, Jan 5, 2016 at 2:17 AM, Reynold Xin <rx...@databricks.com> wrote:

> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend on
> stopped supporting 2.6. We can still convince the library maintainers to
> support 2.6, but it will be extra work. I'm curious if anybody still uses
> Python 2.6 to run Spark.
>
> Thanks.
>
>
>


-- 
Aleksandar Kacanski

Re: [discuss] dropping Python 2.6 support

Posted by shane knapp <sk...@berkeley.edu>.
(this is a build system-specific reply, but quite pertinent to the conversation)

we currently test spark on a centos 6.X deployment, but in the next
~month will be bumping everything to centos 7.  by default, centos 7
comes w/python 2.7.5 installed as the system python.  for any builds
that need python 2.6 (1.5 and earlier), we'll be using anaconda
environments to manage them.

i'm generally VERY happy with how easy it is to manage our three
different python environments with anaconda, and don't plan on
changing that at all in the foreseeable future.

shane

On Mon, Jan 11, 2016 at 8:52 AM, David Chin <da...@drexel.edu> wrote:
> FWIW, RHEL 6 still uses Python 2.6, although 2.7.8 and 3.3.2 are available
> through Red Hat Software Collections. See:
> https://www.softwarecollections.org/en/
>
> I run an academic compute cluster on RHEL 6. We do, however, provide Python
> 2.7.x and 3.5.x via modulefiles.
>
> On Tue, Jan 5, 2016 at 8:45 AM, Nicholas Chammas
> <ni...@gmail.com> wrote:
>>
>> +1
>>
>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes,
>> Python 2.6 is ancient history and the core Python developers stopped
>> supporting it in 2013. REHL 5 is not a good enough reason to continue
>> support for Python 2.6 IMO.
>>
>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>> currently do).
>>
>> Nick
>>
>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>>>
>>> plus 1,
>>>
>>> we are currently using python 2.7.2 in production environment.
>>>
>>>
>>>
>>>
>>>
>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>
>>> +1
>>> We use Python 2.7
>>>
>>> Regards,
>>>
>>> Meethu Mathew
>>>
>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>>>>
>>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>>> 2.0?
>>>>
>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>> Python 2.6 to run Spark.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>
>
>
>
> --
> David Chin, Ph.D.
> david.chin@drexel.edu    Sr. Systems Administrator, URCF, Drexel U.
> http://www.drexel.edu/research/urcf/
> https://linuxfollies.blogspot.com/
> +1.215.221.4747 (mobile)
> https://github.com/prehensilecode
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by David Chin <da...@drexel.edu>.
FWIW, RHEL 6 still uses Python 2.6, although 2.7.8 and 3.3.2 are available
through Red Hat Software Collections. See:
https://www.softwarecollections.org/en/

I run an academic compute cluster on RHEL 6. We do, however, provide Python
2.7.x and 3.5.x via modulefiles.

On Tue, Jan 5, 2016 at 8:45 AM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> +1
>
> Red Hat supports Python 2.6 on REHL 5 until 2020
> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>, but
> otherwise yes, Python 2.6 is ancient history and the core Python developers
> stopped supporting it in 2013. REHL 5 is not a good enough reason to
> continue support for Python 2.6 IMO.
>
> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
> currently do).
>
> Nick
>
> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>
>> plus 1,
>>
>> we are currently using python 2.7.2 in production environment.
>>
>>
>>
>>
>>
>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>
>> +1
>> We use Python 2.7
>>
>> Regards,
>>
>> Meethu Mathew
>>
>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>>
>>


-- 
David Chin, Ph.D.
david.chin@drexel.edu    Sr. Systems Administrator, URCF, Drexel U.
http://www.drexel.edu/research/urcf/
https://linuxfollies.blogspot.com/
+1.215.221.4747 (mobile)
https://github.com/prehensilecode

Re: [discuss] dropping Python 2.6 support

Posted by Ted Yu <yu...@gmail.com>.
+1

> On Jan 5, 2016, at 10:49 AM, Davies Liu <da...@databricks.com> wrote:
> 
> +1
> 
> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas
> <ni...@gmail.com> wrote:
>> +1
>> 
>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python
>> 2.6 is ancient history and the core Python developers stopped supporting it
>> in 2013. REHL 5 is not a good enough reason to continue support for Python
>> 2.6 IMO.
>> 
>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>> currently do).
>> 
>> Nick
>> 
>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>>> 
>>> plus 1,
>>> 
>>> we are currently using python 2.7.2 in production environment.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>> 
>>> +1
>>> We use Python 2.7
>>> 
>>> Regards,
>>> 
>>> Meethu Mathew
>>> 
>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>>>> 
>>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>>> 2.0?
>>>> 
>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>> Python 2.6 to run Spark.
>>>> 
>>>> Thanks.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Ted Yu <yu...@gmail.com>.
+1

> On Jan 5, 2016, at 10:49 AM, Davies Liu <da...@databricks.com> wrote:
> 
> +1
> 
> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas
> <ni...@gmail.com> wrote:
>> +1
>> 
>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python
>> 2.6 is ancient history and the core Python developers stopped supporting it
>> in 2013. REHL 5 is not a good enough reason to continue support for Python
>> 2.6 IMO.
>> 
>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>> currently do).
>> 
>> Nick
>> 
>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>>> 
>>> plus 1,
>>> 
>>> we are currently using python 2.7.2 in production environment.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>> 
>>> +1
>>> We use Python 2.7
>>> 
>>> Regards,
>>> 
>>> Meethu Mathew
>>> 
>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>>>> 
>>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>>> 2.0?
>>>> 
>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>> Python 2.6 to run Spark.
>>>> 
>>>> Thanks.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Davies Liu <da...@databricks.com>.
+1

On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas
<ni...@gmail.com> wrote:
> +1
>
> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python
> 2.6 is ancient history and the core Python developers stopped supporting it
> in 2013. REHL 5 is not a good enough reason to continue support for Python
> 2.6 IMO.
>
> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
> currently do).
>
> Nick
>
> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>>
>> plus 1,
>>
>> we are currently using python 2.7.2 in production environment.
>>
>>
>>
>>
>>
>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>
>> +1
>> We use Python 2.7
>>
>> Regards,
>>
>> Meethu Mathew
>>
>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by David Chin <da...@drexel.edu>.
FWIW, RHEL 6 still uses Python 2.6, although 2.7.8 and 3.3.2 are available
through Red Hat Software Collections. See:
https://www.softwarecollections.org/en/

I run an academic compute cluster on RHEL 6. We do, however, provide Python
2.7.x and 3.5.x via modulefiles.

On Tue, Jan 5, 2016 at 8:45 AM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> +1
>
> Red Hat supports Python 2.6 on REHL 5 until 2020
> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>, but
> otherwise yes, Python 2.6 is ancient history and the core Python developers
> stopped supporting it in 2013. REHL 5 is not a good enough reason to
> continue support for Python 2.6 IMO.
>
> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
> currently do).
>
> Nick
>
> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>
>> plus 1,
>>
>> we are currently using python 2.7.2 in production environment.
>>
>>
>>
>>
>>
>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>
>> +1
>> We use Python 2.7
>>
>> Regards,
>>
>> Meethu Mathew
>>
>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>>
>>


-- 
David Chin, Ph.D.
david.chin@drexel.edu    Sr. Systems Administrator, URCF, Drexel U.
http://www.drexel.edu/research/urcf/
https://linuxfollies.blogspot.com/
+1.215.221.4747 (mobile)
https://github.com/prehensilecode

Re: [discuss] dropping Python 2.6 support

Posted by Davies Liu <da...@databricks.com>.
Created JIRA: https://issues.apache.org/jira/browse/SPARK-12661

On Tue, Jan 5, 2016 at 2:49 PM, Koert Kuipers <ko...@tresata.com> wrote:
> i do not think so.
>
> does the python 2.7 need to be installed on all slaves? if so, we do not
> have direct access to those.
>
> also, spark is easy for us to ship with our software since its apache 2
> licensed, and it only needs to be present on the machine that launches the
> app (thanks to yarn).
> even if python 2.7 was needed only on this one machine that launches the app
> we can not ship it with our software because its gpl licensed, so the client
> would have to download it and install it themselves, and this would mean its
> an independent install which has to be audited and approved and now you are
> in for a lot of fun. basically it will never happen.
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com> wrote:
>>
>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>> imagine that they're also capable of installing a standalone Python
>> alongside that Spark version (without changing Python systemwide). For
>> instance, Anaconda/Miniconda make it really easy to install Python 2.7.x/3.x
>> without impacting / changing the system Python and doesn't require any
>> special permissions to install (you don't need root / sudo access). Does
>> this address the Python versioning concerns for RHEL users?
>>
>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>> yeah, the practical concern is that we have no control over java or
>>> python version on large company clusters. our current reality for the vast
>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>
>>> i dont like it either, but i cannot change it.
>>>
>>> we currently don't use pyspark so i have no stake in this, but if we did
>>> i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>> dropped. no point in developing something that doesnt run for majority of
>>> customers.
>>>
>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas
>>> <ni...@gmail.com> wrote:
>>>>
>>>> As I pointed out in my earlier email, RHEL will support Python 2.6 until
>>>> 2020. So I'm assuming these large companies will have the option of riding
>>>> out Python 2.6 until then.
>>>>
>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>> for the next several years? Even though the core Python devs stopped
>>>> supporting it in 2013?
>>>>
>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>> support? What are the criteria?
>>>>
>>>> I understand the practical concern here. If companies are stuck using
>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>> concern against the maintenance burden on this project, I would say that
>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>
>>>> I suppose if our main PySpark contributors are fine putting up with
>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>
>>>> Nick
>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente
>>>> <ju...@esbet.es>님이 작성:
>>>>>
>>>>> Unfortunately, Koert is right.
>>>>>
>>>>> I've been in a couple of projects using Spark (banking industry) where
>>>>> CentOS + Python 2.6 is the toolbox available.
>>>>>
>>>>> That said, I believe it should not be a concern for Spark. Python 2.6
>>>>> is old and busted, which is totally opposite to the Spark philosophy IMO.
>>>>>
>>>>>
>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>>>>>
>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>
>>>>> if so, i still know plenty of large companies where python 2.6 is the
>>>>> only option. asking them for python 2.7 is not going to work
>>>>>
>>>>> so i think its a bad idea
>>>>>
>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland
>>>>> <ju...@gmail.com> wrote:
>>>>>>
>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>>>> the version they should theoretically use. Dropping python 2.6
>>>>>> support sounds very reasonable to me.
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas
>>>>>> <ni...@gmail.com> wrote:
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes,
>>>>>>> Python 2.6 is ancient history and the core Python developers stopped
>>>>>>> supporting it in 2013. REHL 5 is not a good enough reason to continue
>>>>>>> support for Python 2.6 IMO.
>>>>>>>
>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>>>>>>> we currently do).
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> plus 1,
>>>>>>>>
>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>>>>>
>>>>>>>> +1
>>>>>>>> We use Python 2.7
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Meethu Mathew
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>>>>> Spark 2.0?
>>>>>>>>>
>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Davies Liu <da...@databricks.com>.
Created JIRA: https://issues.apache.org/jira/browse/SPARK-12661

On Tue, Jan 5, 2016 at 2:49 PM, Koert Kuipers <ko...@tresata.com> wrote:
> i do not think so.
>
> does the python 2.7 need to be installed on all slaves? if so, we do not
> have direct access to those.
>
> also, spark is easy for us to ship with our software since its apache 2
> licensed, and it only needs to be present on the machine that launches the
> app (thanks to yarn).
> even if python 2.7 was needed only on this one machine that launches the app
> we can not ship it with our software because its gpl licensed, so the client
> would have to download it and install it themselves, and this would mean its
> an independent install which has to be audited and approved and now you are
> in for a lot of fun. basically it will never happen.
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com> wrote:
>>
>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>> imagine that they're also capable of installing a standalone Python
>> alongside that Spark version (without changing Python systemwide). For
>> instance, Anaconda/Miniconda make it really easy to install Python 2.7.x/3.x
>> without impacting / changing the system Python and doesn't require any
>> special permissions to install (you don't need root / sudo access). Does
>> this address the Python versioning concerns for RHEL users?
>>
>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>> yeah, the practical concern is that we have no control over java or
>>> python version on large company clusters. our current reality for the vast
>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>
>>> i dont like it either, but i cannot change it.
>>>
>>> we currently don't use pyspark so i have no stake in this, but if we did
>>> i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>> dropped. no point in developing something that doesnt run for majority of
>>> customers.
>>>
>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas
>>> <ni...@gmail.com> wrote:
>>>>
>>>> As I pointed out in my earlier email, RHEL will support Python 2.6 until
>>>> 2020. So I'm assuming these large companies will have the option of riding
>>>> out Python 2.6 until then.
>>>>
>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>> for the next several years? Even though the core Python devs stopped
>>>> supporting it in 2013?
>>>>
>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>> support? What are the criteria?
>>>>
>>>> I understand the practical concern here. If companies are stuck using
>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>> concern against the maintenance burden on this project, I would say that
>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>
>>>> I suppose if our main PySpark contributors are fine putting up with
>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>
>>>> Nick
>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente
>>>> <ju...@esbet.es>님이 작성:
>>>>>
>>>>> Unfortunately, Koert is right.
>>>>>
>>>>> I've been in a couple of projects using Spark (banking industry) where
>>>>> CentOS + Python 2.6 is the toolbox available.
>>>>>
>>>>> That said, I believe it should not be a concern for Spark. Python 2.6
>>>>> is old and busted, which is totally opposite to the Spark philosophy IMO.
>>>>>
>>>>>
>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>>>>>
>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>
>>>>> if so, i still know plenty of large companies where python 2.6 is the
>>>>> only option. asking them for python 2.7 is not going to work
>>>>>
>>>>> so i think its a bad idea
>>>>>
>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland
>>>>> <ju...@gmail.com> wrote:
>>>>>>
>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>>>> the version they should theoretically use. Dropping python 2.6
>>>>>> support sounds very reasonable to me.
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas
>>>>>> <ni...@gmail.com> wrote:
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes,
>>>>>>> Python 2.6 is ancient history and the core Python developers stopped
>>>>>>> supporting it in 2013. REHL 5 is not a good enough reason to continue
>>>>>>> support for Python 2.6 IMO.
>>>>>>>
>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>>>>>>> we currently do).
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> plus 1,
>>>>>>>>
>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>>>>>
>>>>>>>> +1
>>>>>>>> We use Python 2.7
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Meethu Mathew
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>>>>> Spark 2.0?
>>>>>>>>>
>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
if python 2.7 only has to be present on the node that launches the app
(does it?) than that could be important indeed.

On Tue, Jan 5, 2016 at 6:02 PM, Koert Kuipers <ko...@tresata.com> wrote:

> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed
>>
>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>> but not GPL <https://docs.python.org/3/license.html>:
>>
>> Note GPL-compatible doesn’t mean that we’re distributing Python under the
>> GPL. All Python licenses, unlike the GPL, let you distribute a modified
>> version without making your changes open source. The GPL-compatible
>> licenses make it possible to combine Python with other software that is
>> released under the GPL; the others don’t.
>>
>> Nick
>> ​
>>
>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i do not think so.
>>>
>>> does the python 2.7 need to be installed on all slaves? if so, we do not
>>> have direct access to those.
>>>
>>> also, spark is easy for us to ship with our software since its apache 2
>>> licensed, and it only needs to be present on the machine that launches the
>>> app (thanks to yarn).
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed, so the
>>> client would have to download it and install it themselves, and this would
>>> mean its an independent install which has to be audited and approved and
>>> now you are in for a lot of fun. basically it will never happen.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>> wrote:
>>>
>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>>> imagine that they're also capable of installing a standalone Python
>>>> alongside that Spark version (without changing Python systemwide). For
>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>> require any special permissions to install (you don't need root / sudo
>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>
>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> yeah, the practical concern is that we have no control over java or
>>>>> python version on large company clusters. our current reality for the vast
>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>
>>>>> i dont like it either, but i cannot change it.
>>>>>
>>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>> customers.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>> riding out Python 2.6 until then.
>>>>>>
>>>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>>>> for the next several years? Even though the core Python devs stopped
>>>>>> supporting it in 2013?
>>>>>>
>>>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>>>> support? What are the criteria?
>>>>>>
>>>>>> I understand the practical concern here. If companies are stuck using
>>>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>>>> concern against the maintenance burden on this project, I would say that
>>>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>>>
>>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>>
>>>>>> Nick
>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>> julio@esbet.es>님이 작성:
>>>>>>
>>>>>>> Unfortunately, Koert is right.
>>>>>>>
>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>
>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>> IMO.
>>>>>>>
>>>>>>>
>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>> escribió:
>>>>>>>
>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>
>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>
>>>>>>> so i think its a bad idea
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>
>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>> support sounds very reasonable to me.
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>
>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>> believe we currently do).
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> plus 1,
>>>>>>>>>>
>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>>> 写道:
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>> We use Python 2.7
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Meethu Mathew
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rxin@databricks.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Does anybody here care about us dropping support for Python 2.6
>>>>>>>>>>> in Spark 2.0?
>>>>>>>>>>>
>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
if python 2.7 only has to be present on the node that launches the app
(does it?) than that could be important indeed.

On Tue, Jan 5, 2016 at 6:02 PM, Koert Kuipers <ko...@tresata.com> wrote:

> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed
>>
>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>> but not GPL <https://docs.python.org/3/license.html>:
>>
>> Note GPL-compatible doesn’t mean that we’re distributing Python under the
>> GPL. All Python licenses, unlike the GPL, let you distribute a modified
>> version without making your changes open source. The GPL-compatible
>> licenses make it possible to combine Python with other software that is
>> released under the GPL; the others don’t.
>>
>> Nick
>> ​
>>
>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i do not think so.
>>>
>>> does the python 2.7 need to be installed on all slaves? if so, we do not
>>> have direct access to those.
>>>
>>> also, spark is easy for us to ship with our software since its apache 2
>>> licensed, and it only needs to be present on the machine that launches the
>>> app (thanks to yarn).
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed, so the
>>> client would have to download it and install it themselves, and this would
>>> mean its an independent install which has to be audited and approved and
>>> now you are in for a lot of fun. basically it will never happen.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>> wrote:
>>>
>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>>> imagine that they're also capable of installing a standalone Python
>>>> alongside that Spark version (without changing Python systemwide). For
>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>> require any special permissions to install (you don't need root / sudo
>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>
>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> yeah, the practical concern is that we have no control over java or
>>>>> python version on large company clusters. our current reality for the vast
>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>
>>>>> i dont like it either, but i cannot change it.
>>>>>
>>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>> customers.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>> riding out Python 2.6 until then.
>>>>>>
>>>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>>>> for the next several years? Even though the core Python devs stopped
>>>>>> supporting it in 2013?
>>>>>>
>>>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>>>> support? What are the criteria?
>>>>>>
>>>>>> I understand the practical concern here. If companies are stuck using
>>>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>>>> concern against the maintenance burden on this project, I would say that
>>>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>>>
>>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>>
>>>>>> Nick
>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>> julio@esbet.es>님이 작성:
>>>>>>
>>>>>>> Unfortunately, Koert is right.
>>>>>>>
>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>
>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>> IMO.
>>>>>>>
>>>>>>>
>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>> escribió:
>>>>>>>
>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>
>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>
>>>>>>> so i think its a bad idea
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>
>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>> support sounds very reasonable to me.
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>
>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>> believe we currently do).
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> plus 1,
>>>>>>>>>>
>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>>> 写道:
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>> We use Python 2.7
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Meethu Mathew
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rxin@databricks.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Does anybody here care about us dropping support for Python 2.6
>>>>>>>>>>> in Spark 2.0?
>>>>>>>>>>>
>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Steve Loughran <st...@hortonworks.com>.
> On 7 Jan 2016, at 19:55, Juliet Hougland <ju...@gmail.com> wrote:
> 
> @ Reynold Xin @Josh Rosen: What is current maintenance burden of supporting Python 2.6? What libraries are no longer supporting Python 2.6 and where does Spark use them?
> 

generally the cost comes in the test matrix: one more thing to test against. You can test for the extremes with the right VMs (me: kerberos-java7-linux) (windows-server+java-8) but you still need to keep those combinations down —and be setup to locally debug/replicate problems. 



Re: [discuss] dropping Python 2.6 support

Posted by Juliet Hougland <ju...@gmail.com>.
@ Reynold Xin @Josh Rosen: What is current maintenance burden of supporting
Python 2.6? What libraries are no longer supporting Python 2.6 and where
does Spark use them?


On Tue, Jan 5, 2016 at 5:40 PM, Jeff Zhang <zj...@gmail.com> wrote:

> +1
>
> On Wed, Jan 6, 2016 at 9:18 AM, Juliet Hougland <juliet.hougland@gmail.com
> > wrote:
>
>> Most admins I talk to about python and spark are already actively (or on
>> their way to) managing their cluster python installations. Even if people
>> begin using the system python with pyspark, there is eventually a user who
>> needs a complex dependency (like pandas or sklearn) on the cluster. No
>> admin would muck around installing libs into system python, so you end up
>> with other python installations.
>>
>> Installing a non-system python is something users intending to use
>> pyspark on a real cluster should be thinking about, eventually, anyway. It
>> would work in situations where people are running pyspark locally or
>> actively managing python installations on a cluster. There is an awkward
>> middle point where someone has installed spark but not configured their
>> cluster (by installing non default python) in any other way. Most clusters
>> I see are RHEL/CentOS and have something other than system python used by
>> spark.
>>
>> What libraries stopped supporting python 2.6 and where does spark use
>> them? The "ease of transitioning to pyspark onto a cluster" problem may be
>> an easier pill to swallow if it only affected something like mllib or spark
>> sql and not parts of the core api. You end up hoping numpy or pandas are
>> installed in the runtime components of spark anyway. At that point people
>> really should just go install a non system python. There are tradeoffs to
>> using pyspark and I feel pretty fine explaining to people that managing
>> their cluster's python installations is something that comes with using
>> pyspark.
>>
>> RHEL/CentOS is so common that this would probably be a little work for a
>> lot of people.
>>
>> --Juliet
>>
>> On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> hey evil admin:)
>>> i think the bit about java was from me?
>>> if so, i meant to indicate that the reality for us is java is 1.7 on
>>> most (all?) clusters. i do not believe spark prefers java 1.8. my point was
>>> that even although java 1.7 is getting old as well it would be a major
>>> issue for me if spark dropped java 1.7 support.
>>>
>>> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <ca...@janelia.hhmi.org>
>>> wrote:
>>>
>>>> As one of the evil administrators that runs a RHEL 6 cluster, we
>>>> already provide quite a few different version of python on our cluster
>>>> pretty darn easily. All you need is a separate install directory and to set
>>>> the PYTHON_HOME environment variable to point to the correct python, then
>>>> have the users make sure the correct python is in their PATH. I understand
>>>> that other administrators may not be so compliant.
>>>>
>>>> Saw a small bit about the java version in there; does Spark currently
>>>> prefer Java 1.8.x?
>>>>
>>>> —Ken
>>>>
>>>> On Jan 5, 2016, at 6:08 PM, Josh Rosen <jo...@databricks.com>
>>>> wrote:
>>>>
>>>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>>>> while continuing to use a vanilla `python` executable on the executors
>>>>
>>>>
>>>> Whoops, just to be clear, this should actually read "while continuing
>>>> to use a vanilla `python` 2.7 executable".
>>>>
>>>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com>
>>>> wrote:
>>>>
>>>>> Yep, the driver and executors need to have compatible Python versions.
>>>>> I think that there are some bytecode-level incompatibilities between 2.6
>>>>> and 2.7 which would impact the deserialization of Python closures, so I
>>>>> think you need to be running the same 2.x version for all communicating
>>>>> Spark processes. Note that you _can_ use a Python 2.7 `ipython` executable
>>>>> on the driver while continuing to use a vanilla `python` executable on the
>>>>> executors (we have environment variables which allow you to control these
>>>>> separately).
>>>>>
>>>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> I think all the slaves need the same (or a compatible) version of
>>>>>> Python installed since they run Python code in PySpark jobs natively.
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> interesting i didnt know that!
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>
>>>>>>>> even if python 2.7 was needed only on this one machine that
>>>>>>>> launches the app we can not ship it with our software because its gpl
>>>>>>>> licensed
>>>>>>>>
>>>>>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>>>>
>>>>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python
>>>>>>>> under the GPL. All Python licenses, unlike the GPL, let you distribute a
>>>>>>>> modified version without making your changes open source. The
>>>>>>>> GPL-compatible licenses make it possible to combine Python with other
>>>>>>>> software that is released under the GPL; the others don’t.
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> ​
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> i do not think so.
>>>>>>>>>
>>>>>>>>> does the python 2.7 need to be installed on all slaves? if so, we
>>>>>>>>> do not have direct access to those.
>>>>>>>>>
>>>>>>>>> also, spark is easy for us to ship with our software since its
>>>>>>>>> apache 2 licensed, and it only needs to be present on the machine that
>>>>>>>>> launches the app (thanks to yarn).
>>>>>>>>> even if python 2.7 was needed only on this one machine that
>>>>>>>>> launches the app we can not ship it with our software because its gpl
>>>>>>>>> licensed, so the client would have to download it and install it
>>>>>>>>> themselves, and this would mean its an independent install which has to be
>>>>>>>>> audited and approved and now you are in for a lot of fun. basically it will
>>>>>>>>> never happen.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <
>>>>>>>>> joshrosen@databricks.com> wrote:
>>>>>>>>>
>>>>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters,
>>>>>>>>>> then I imagine that they're also capable of installing a standalone Python
>>>>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> yeah, the practical concern is that we have no control over java
>>>>>>>>>>> or python version on large company clusters. our current reality for the
>>>>>>>>>>> vast majority of them is java 7 and python 2.6, no matter how outdated that
>>>>>>>>>>> is.
>>>>>>>>>>>
>>>>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>>>>
>>>>>>>>>>> we currently don't use pyspark so i have no stake in this, but
>>>>>>>>>>> if we did i can assure you we would not upgrade to spark 2.x if python 2.6
>>>>>>>>>>> was dropped. no point in developing something that doesnt run for majority
>>>>>>>>>>> of customers.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python
>>>>>>>>>>>> 2.6 until 2020. So I'm assuming these large companies will have the option
>>>>>>>>>>>> of riding out Python 2.6 until then.
>>>>>>>>>>>>
>>>>>>>>>>>> Are we seriously saying that Spark should likewise support
>>>>>>>>>>>> Python 2.6 for the next several years? Even though the core Python devs
>>>>>>>>>>>> stopped supporting it in 2013?
>>>>>>>>>>>>
>>>>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>>>>
>>>>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>>>>>> support 2.6.
>>>>>>>>>>>>
>>>>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>>>>> with those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>>>>>
>>>>>>>>>>>> Nick
>>>>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've been in a couple of projects using Spark (banking
>>>>>>>>>>>>> industry) where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That said, I believe it should not be a concern for Spark.
>>>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the Spark
>>>>>>>>>>>>> philosophy IMO.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>>>> escribió:
>>>>>>>>>>>>>
>>>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6
>>>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>>>>
>>>>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python
>>>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> we are currently using python 2.7.2 in production
>>>>>>>>>>>>>>>> environment.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>>>>> meethu.mathew@flytxt.com> 写道:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Does anybody here care about us dropping support for
>>>>>>>>>>>>>>>>> Python 2.6 in Spark 2.0?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: [discuss] dropping Python 2.6 support

Posted by Jeff Zhang <zj...@gmail.com>.
+1

On Wed, Jan 6, 2016 at 9:18 AM, Juliet Hougland <ju...@gmail.com>
wrote:

> Most admins I talk to about python and spark are already actively (or on
> their way to) managing their cluster python installations. Even if people
> begin using the system python with pyspark, there is eventually a user who
> needs a complex dependency (like pandas or sklearn) on the cluster. No
> admin would muck around installing libs into system python, so you end up
> with other python installations.
>
> Installing a non-system python is something users intending to use pyspark
> on a real cluster should be thinking about, eventually, anyway. It would
> work in situations where people are running pyspark locally or actively
> managing python installations on a cluster. There is an awkward middle
> point where someone has installed spark but not configured their cluster
> (by installing non default python) in any other way. Most clusters I see
> are RHEL/CentOS and have something other than system python used by spark.
>
> What libraries stopped supporting python 2.6 and where does spark use
> them? The "ease of transitioning to pyspark onto a cluster" problem may be
> an easier pill to swallow if it only affected something like mllib or spark
> sql and not parts of the core api. You end up hoping numpy or pandas are
> installed in the runtime components of spark anyway. At that point people
> really should just go install a non system python. There are tradeoffs to
> using pyspark and I feel pretty fine explaining to people that managing
> their cluster's python installations is something that comes with using
> pyspark.
>
> RHEL/CentOS is so common that this would probably be a little work for a
> lot of people.
>
> --Juliet
>
> On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> hey evil admin:)
>> i think the bit about java was from me?
>> if so, i meant to indicate that the reality for us is java is 1.7 on most
>> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
>> even although java 1.7 is getting old as well it would be a major issue for
>> me if spark dropped java 1.7 support.
>>
>> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <ca...@janelia.hhmi.org>
>> wrote:
>>
>>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>>> provide quite a few different version of python on our cluster pretty darn
>>> easily. All you need is a separate install directory and to set the
>>> PYTHON_HOME environment variable to point to the correct python, then have
>>> the users make sure the correct python is in their PATH. I understand that
>>> other administrators may not be so compliant.
>>>
>>> Saw a small bit about the java version in there; does Spark currently
>>> prefer Java 1.8.x?
>>>
>>> —Ken
>>>
>>> On Jan 5, 2016, at 6:08 PM, Josh Rosen <jo...@databricks.com> wrote:
>>>
>>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>>> while continuing to use a vanilla `python` executable on the executors
>>>
>>>
>>> Whoops, just to be clear, this should actually read "while continuing to
>>> use a vanilla `python` 2.7 executable".
>>>
>>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com>
>>> wrote:
>>>
>>>> Yep, the driver and executors need to have compatible Python versions.
>>>> I think that there are some bytecode-level incompatibilities between 2.6
>>>> and 2.7 which would impact the deserialization of Python closures, so I
>>>> think you need to be running the same 2.x version for all communicating
>>>> Spark processes. Note that you _can_ use a Python 2.7 `ipython` executable
>>>> on the driver while continuing to use a vanilla `python` executable on the
>>>> executors (we have environment variables which allow you to control these
>>>> separately).
>>>>
>>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> I think all the slaves need the same (or a compatible) version of
>>>>> Python installed since they run Python code in PySpark jobs natively.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> interesting i didnt know that!
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>
>>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>>>
>>>>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>>>
>>>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python
>>>>>>> under the GPL. All Python licenses, unlike the GPL, let you distribute a
>>>>>>> modified version without making your changes open source. The
>>>>>>> GPL-compatible licenses make it possible to combine Python with other
>>>>>>> software that is released under the GPL; the others don’t.
>>>>>>>
>>>>>>> Nick
>>>>>>> ​
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> i do not think so.
>>>>>>>>
>>>>>>>> does the python 2.7 need to be installed on all slaves? if so, we
>>>>>>>> do not have direct access to those.
>>>>>>>>
>>>>>>>> also, spark is easy for us to ship with our software since its
>>>>>>>> apache 2 licensed, and it only needs to be present on the machine that
>>>>>>>> launches the app (thanks to yarn).
>>>>>>>> even if python 2.7 was needed only on this one machine that
>>>>>>>> launches the app we can not ship it with our software because its gpl
>>>>>>>> licensed, so the client would have to download it and install it
>>>>>>>> themselves, and this would mean its an independent install which has to be
>>>>>>>> audited and approved and now you are in for a lot of fun. basically it will
>>>>>>>> never happen.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <
>>>>>>>> joshrosen@databricks.com> wrote:
>>>>>>>>
>>>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters,
>>>>>>>>> then I imagine that they're also capable of installing a standalone Python
>>>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> yeah, the practical concern is that we have no control over java
>>>>>>>>>> or python version on large company clusters. our current reality for the
>>>>>>>>>> vast majority of them is java 7 and python 2.6, no matter how outdated that
>>>>>>>>>> is.
>>>>>>>>>>
>>>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>>>
>>>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>>>>>> customers.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python
>>>>>>>>>>> 2.6 until 2020. So I'm assuming these large companies will have the option
>>>>>>>>>>> of riding out Python 2.6 until then.
>>>>>>>>>>>
>>>>>>>>>>> Are we seriously saying that Spark should likewise support
>>>>>>>>>>> Python 2.6 for the next several years? Even though the core Python devs
>>>>>>>>>>> stopped supporting it in 2013?
>>>>>>>>>>>
>>>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>>>
>>>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>>>>> support 2.6.
>>>>>>>>>>>
>>>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>>>> with those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>>>>
>>>>>>>>>>> Nick
>>>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>>>
>>>>>>>>>>>> I've been in a couple of projects using Spark (banking
>>>>>>>>>>>> industry) where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>>>
>>>>>>>>>>>> That said, I believe it should not be a concern for Spark.
>>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the Spark
>>>>>>>>>>>> philosophy IMO.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>>> escribió:
>>>>>>>>>>>>
>>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>>>
>>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6
>>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>>>
>>>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python
>>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> we are currently using python 2.7.2 in production
>>>>>>>>>>>>>>> environment.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>>>> meethu.mathew@flytxt.com> 写道:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>>
>>
>


-- 
Best Regards

Jeff Zhang

Re: [discuss] dropping Python 2.6 support

Posted by Jeff Zhang <zj...@gmail.com>.
+1

On Wed, Jan 6, 2016 at 9:18 AM, Juliet Hougland <ju...@gmail.com>
wrote:

> Most admins I talk to about python and spark are already actively (or on
> their way to) managing their cluster python installations. Even if people
> begin using the system python with pyspark, there is eventually a user who
> needs a complex dependency (like pandas or sklearn) on the cluster. No
> admin would muck around installing libs into system python, so you end up
> with other python installations.
>
> Installing a non-system python is something users intending to use pyspark
> on a real cluster should be thinking about, eventually, anyway. It would
> work in situations where people are running pyspark locally or actively
> managing python installations on a cluster. There is an awkward middle
> point where someone has installed spark but not configured their cluster
> (by installing non default python) in any other way. Most clusters I see
> are RHEL/CentOS and have something other than system python used by spark.
>
> What libraries stopped supporting python 2.6 and where does spark use
> them? The "ease of transitioning to pyspark onto a cluster" problem may be
> an easier pill to swallow if it only affected something like mllib or spark
> sql and not parts of the core api. You end up hoping numpy or pandas are
> installed in the runtime components of spark anyway. At that point people
> really should just go install a non system python. There are tradeoffs to
> using pyspark and I feel pretty fine explaining to people that managing
> their cluster's python installations is something that comes with using
> pyspark.
>
> RHEL/CentOS is so common that this would probably be a little work for a
> lot of people.
>
> --Juliet
>
> On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> hey evil admin:)
>> i think the bit about java was from me?
>> if so, i meant to indicate that the reality for us is java is 1.7 on most
>> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
>> even although java 1.7 is getting old as well it would be a major issue for
>> me if spark dropped java 1.7 support.
>>
>> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <ca...@janelia.hhmi.org>
>> wrote:
>>
>>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>>> provide quite a few different version of python on our cluster pretty darn
>>> easily. All you need is a separate install directory and to set the
>>> PYTHON_HOME environment variable to point to the correct python, then have
>>> the users make sure the correct python is in their PATH. I understand that
>>> other administrators may not be so compliant.
>>>
>>> Saw a small bit about the java version in there; does Spark currently
>>> prefer Java 1.8.x?
>>>
>>> —Ken
>>>
>>> On Jan 5, 2016, at 6:08 PM, Josh Rosen <jo...@databricks.com> wrote:
>>>
>>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>>> while continuing to use a vanilla `python` executable on the executors
>>>
>>>
>>> Whoops, just to be clear, this should actually read "while continuing to
>>> use a vanilla `python` 2.7 executable".
>>>
>>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com>
>>> wrote:
>>>
>>>> Yep, the driver and executors need to have compatible Python versions.
>>>> I think that there are some bytecode-level incompatibilities between 2.6
>>>> and 2.7 which would impact the deserialization of Python closures, so I
>>>> think you need to be running the same 2.x version for all communicating
>>>> Spark processes. Note that you _can_ use a Python 2.7 `ipython` executable
>>>> on the driver while continuing to use a vanilla `python` executable on the
>>>> executors (we have environment variables which allow you to control these
>>>> separately).
>>>>
>>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> I think all the slaves need the same (or a compatible) version of
>>>>> Python installed since they run Python code in PySpark jobs natively.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> interesting i didnt know that!
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>
>>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>>>
>>>>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>>>
>>>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python
>>>>>>> under the GPL. All Python licenses, unlike the GPL, let you distribute a
>>>>>>> modified version without making your changes open source. The
>>>>>>> GPL-compatible licenses make it possible to combine Python with other
>>>>>>> software that is released under the GPL; the others don’t.
>>>>>>>
>>>>>>> Nick
>>>>>>> ​
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> i do not think so.
>>>>>>>>
>>>>>>>> does the python 2.7 need to be installed on all slaves? if so, we
>>>>>>>> do not have direct access to those.
>>>>>>>>
>>>>>>>> also, spark is easy for us to ship with our software since its
>>>>>>>> apache 2 licensed, and it only needs to be present on the machine that
>>>>>>>> launches the app (thanks to yarn).
>>>>>>>> even if python 2.7 was needed only on this one machine that
>>>>>>>> launches the app we can not ship it with our software because its gpl
>>>>>>>> licensed, so the client would have to download it and install it
>>>>>>>> themselves, and this would mean its an independent install which has to be
>>>>>>>> audited and approved and now you are in for a lot of fun. basically it will
>>>>>>>> never happen.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <
>>>>>>>> joshrosen@databricks.com> wrote:
>>>>>>>>
>>>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters,
>>>>>>>>> then I imagine that they're also capable of installing a standalone Python
>>>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> yeah, the practical concern is that we have no control over java
>>>>>>>>>> or python version on large company clusters. our current reality for the
>>>>>>>>>> vast majority of them is java 7 and python 2.6, no matter how outdated that
>>>>>>>>>> is.
>>>>>>>>>>
>>>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>>>
>>>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>>>>>> customers.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python
>>>>>>>>>>> 2.6 until 2020. So I'm assuming these large companies will have the option
>>>>>>>>>>> of riding out Python 2.6 until then.
>>>>>>>>>>>
>>>>>>>>>>> Are we seriously saying that Spark should likewise support
>>>>>>>>>>> Python 2.6 for the next several years? Even though the core Python devs
>>>>>>>>>>> stopped supporting it in 2013?
>>>>>>>>>>>
>>>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>>>
>>>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>>>>> support 2.6.
>>>>>>>>>>>
>>>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>>>> with those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>>>>
>>>>>>>>>>> Nick
>>>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>>>
>>>>>>>>>>>> I've been in a couple of projects using Spark (banking
>>>>>>>>>>>> industry) where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>>>
>>>>>>>>>>>> That said, I believe it should not be a concern for Spark.
>>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the Spark
>>>>>>>>>>>> philosophy IMO.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>>> escribió:
>>>>>>>>>>>>
>>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>>>
>>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6
>>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>>>
>>>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python
>>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> we are currently using python 2.7.2 in production
>>>>>>>>>>>>>>> environment.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>>>> meethu.mathew@flytxt.com> 写道:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>>
>>
>


-- 
Best Regards

Jeff Zhang

Re: [discuss] dropping Python 2.6 support

Posted by Juliet Hougland <ju...@gmail.com>.
Most admins I talk to about python and spark are already actively (or on
their way to) managing their cluster python installations. Even if people
begin using the system python with pyspark, there is eventually a user who
needs a complex dependency (like pandas or sklearn) on the cluster. No
admin would muck around installing libs into system python, so you end up
with other python installations.

Installing a non-system python is something users intending to use pyspark
on a real cluster should be thinking about, eventually, anyway. It would
work in situations where people are running pyspark locally or actively
managing python installations on a cluster. There is an awkward middle
point where someone has installed spark but not configured their cluster
(by installing non default python) in any other way. Most clusters I see
are RHEL/CentOS and have something other than system python used by spark.

What libraries stopped supporting python 2.6 and where does spark use them?
The "ease of transitioning to pyspark onto a cluster" problem may be an
easier pill to swallow if it only affected something like mllib or spark
sql and not parts of the core api. You end up hoping numpy or pandas are
installed in the runtime components of spark anyway. At that point people
really should just go install a non system python. There are tradeoffs to
using pyspark and I feel pretty fine explaining to people that managing
their cluster's python installations is something that comes with using
pyspark.

RHEL/CentOS is so common that this would probably be a little work for a
lot of people.

--Juliet

On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hey evil admin:)
> i think the bit about java was from me?
> if so, i meant to indicate that the reality for us is java is 1.7 on most
> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
> even although java 1.7 is getting old as well it would be a major issue for
> me if spark dropped java 1.7 support.
>
> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <ca...@janelia.hhmi.org>
> wrote:
>
>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>> provide quite a few different version of python on our cluster pretty darn
>> easily. All you need is a separate install directory and to set the
>> PYTHON_HOME environment variable to point to the correct python, then have
>> the users make sure the correct python is in their PATH. I understand that
>> other administrators may not be so compliant.
>>
>> Saw a small bit about the java version in there; does Spark currently
>> prefer Java 1.8.x?
>>
>> —Ken
>>
>> On Jan 5, 2016, at 6:08 PM, Josh Rosen <jo...@databricks.com> wrote:
>>
>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>> while continuing to use a vanilla `python` executable on the executors
>>
>>
>> Whoops, just to be clear, this should actually read "while continuing to
>> use a vanilla `python` 2.7 executable".
>>
>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com>
>> wrote:
>>
>>> Yep, the driver and executors need to have compatible Python versions. I
>>> think that there are some bytecode-level incompatibilities between 2.6 and
>>> 2.7 which would impact the deserialization of Python closures, so I think
>>> you need to be running the same 2.x version for all communicating Spark
>>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>>> driver while continuing to use a vanilla `python` executable on the
>>> executors (we have environment variables which allow you to control these
>>> separately).
>>>
>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> I think all the slaves need the same (or a compatible) version of
>>>> Python installed since they run Python code in PySpark jobs natively.
>>>>
>>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>>
>>>>> interesting i didnt know that!
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>>
>>>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>>
>>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>>>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>>>>> version without making your changes open source. The GPL-compatible
>>>>>> licenses make it possible to combine Python with other software that is
>>>>>> released under the GPL; the others don’t.
>>>>>>
>>>>>> Nick
>>>>>> ​
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> i do not think so.
>>>>>>>
>>>>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>>>>> not have direct access to those.
>>>>>>>
>>>>>>> also, spark is easy for us to ship with our software since its
>>>>>>> apache 2 licensed, and it only needs to be present on the machine that
>>>>>>> launches the app (thanks to yarn).
>>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>>> the app we can not ship it with our software because its gpl licensed, so
>>>>>>> the client would have to download it and install it themselves, and this
>>>>>>> would mean its an independent install which has to be audited and approved
>>>>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <joshrosen@databricks.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then
>>>>>>>> I imagine that they're also capable of installing a standalone Python
>>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> yeah, the practical concern is that we have no control over java
>>>>>>>>> or python version on large company clusters. our current reality for the
>>>>>>>>> vast majority of them is java 7 and python 2.6, no matter how outdated that
>>>>>>>>> is.
>>>>>>>>>
>>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>>
>>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>>>>> customers.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python
>>>>>>>>>> 2.6 until 2020. So I'm assuming these large companies will have the option
>>>>>>>>>> of riding out Python 2.6 until then.
>>>>>>>>>>
>>>>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>>>>> 2.6 for the next several years? Even though the core Python devs stopped
>>>>>>>>>> supporting it in 2013?
>>>>>>>>>>
>>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>>
>>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>>>> support 2.6.
>>>>>>>>>>
>>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>>> with those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>>>
>>>>>>>>>> Nick
>>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>>>
>>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>>
>>>>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>>
>>>>>>>>>>> That said, I believe it should not be a concern for Spark.
>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the Spark
>>>>>>>>>>> philosophy IMO.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>> escribió:
>>>>>>>>>>>
>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>>
>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6
>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>>
>>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python
>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>>> meethu.mathew@flytxt.com> 写道:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Josh Rosen <jo...@databricks.com>.
I don't think that we're planning to drop Java 7 support for Spark 2.0.

Personally, I would recommend using Java 8 if you're running Spark 1.5.0+
and are using SQL/DataFrames so that you can benefit from improvements to
code cache flushing in the Java 8 JVMs. Spark SQL's generated classes can
fill up the JVM's code cache, which causes JIT to stop working for new
bytecode. Empirically, it looks like the Java 8 JVMs have an improved
ability to flush this code cache, thereby avoiding this problem.

TL;DR: I'd prefer to run Java 8 with Spark if given the choice.

On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hey evil admin:)
> i think the bit about java was from me?
> if so, i meant to indicate that the reality for us is java is 1.7 on most
> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
> even although java 1.7 is getting old as well it would be a major issue for
> me if spark dropped java 1.7 support.
>
> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <ca...@janelia.hhmi.org>
> wrote:
>
>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>> provide quite a few different version of python on our cluster pretty darn
>> easily. All you need is a separate install directory and to set the
>> PYTHON_HOME environment variable to point to the correct python, then have
>> the users make sure the correct python is in their PATH. I understand that
>> other administrators may not be so compliant.
>>
>> Saw a small bit about the java version in there; does Spark currently
>> prefer Java 1.8.x?
>>
>> —Ken
>>
>> On Jan 5, 2016, at 6:08 PM, Josh Rosen <jo...@databricks.com> wrote:
>>
>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>> while continuing to use a vanilla `python` executable on the executors
>>
>>
>> Whoops, just to be clear, this should actually read "while continuing to
>> use a vanilla `python` 2.7 executable".
>>
>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com>
>> wrote:
>>
>>> Yep, the driver and executors need to have compatible Python versions. I
>>> think that there are some bytecode-level incompatibilities between 2.6 and
>>> 2.7 which would impact the deserialization of Python closures, so I think
>>> you need to be running the same 2.x version for all communicating Spark
>>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>>> driver while continuing to use a vanilla `python` executable on the
>>> executors (we have environment variables which allow you to control these
>>> separately).
>>>
>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> I think all the slaves need the same (or a compatible) version of
>>>> Python installed since they run Python code in PySpark jobs natively.
>>>>
>>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>>
>>>>> interesting i didnt know that!
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>>
>>>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>>
>>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>>>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>>>>> version without making your changes open source. The GPL-compatible
>>>>>> licenses make it possible to combine Python with other software that is
>>>>>> released under the GPL; the others don’t.
>>>>>>
>>>>>> Nick
>>>>>> ​
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> i do not think so.
>>>>>>>
>>>>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>>>>> not have direct access to those.
>>>>>>>
>>>>>>> also, spark is easy for us to ship with our software since its
>>>>>>> apache 2 licensed, and it only needs to be present on the machine that
>>>>>>> launches the app (thanks to yarn).
>>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>>> the app we can not ship it with our software because its gpl licensed, so
>>>>>>> the client would have to download it and install it themselves, and this
>>>>>>> would mean its an independent install which has to be audited and approved
>>>>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <joshrosen@databricks.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then
>>>>>>>> I imagine that they're also capable of installing a standalone Python
>>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> yeah, the practical concern is that we have no control over java
>>>>>>>>> or python version on large company clusters. our current reality for the
>>>>>>>>> vast majority of them is java 7 and python 2.6, no matter how outdated that
>>>>>>>>> is.
>>>>>>>>>
>>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>>
>>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>>>>> customers.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python
>>>>>>>>>> 2.6 until 2020. So I'm assuming these large companies will have the option
>>>>>>>>>> of riding out Python 2.6 until then.
>>>>>>>>>>
>>>>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>>>>> 2.6 for the next several years? Even though the core Python devs stopped
>>>>>>>>>> supporting it in 2013?
>>>>>>>>>>
>>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>>
>>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>>>> support 2.6.
>>>>>>>>>>
>>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>>> with those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>>>
>>>>>>>>>> Nick
>>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>>>
>>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>>
>>>>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>>
>>>>>>>>>>> That said, I believe it should not be a concern for Spark.
>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the Spark
>>>>>>>>>>> philosophy IMO.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>> escribió:
>>>>>>>>>>>
>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>>
>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6
>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>>
>>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python
>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>>> meethu.mathew@flytxt.com> 写道:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Josh Rosen <jo...@databricks.com>.
I don't think that we're planning to drop Java 7 support for Spark 2.0.

Personally, I would recommend using Java 8 if you're running Spark 1.5.0+
and are using SQL/DataFrames so that you can benefit from improvements to
code cache flushing in the Java 8 JVMs. Spark SQL's generated classes can
fill up the JVM's code cache, which causes JIT to stop working for new
bytecode. Empirically, it looks like the Java 8 JVMs have an improved
ability to flush this code cache, thereby avoiding this problem.

TL;DR: I'd prefer to run Java 8 with Spark if given the choice.

On Tue, Jan 5, 2016 at 4:07 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hey evil admin:)
> i think the bit about java was from me?
> if so, i meant to indicate that the reality for us is java is 1.7 on most
> (all?) clusters. i do not believe spark prefers java 1.8. my point was that
> even although java 1.7 is getting old as well it would be a major issue for
> me if spark dropped java 1.7 support.
>
> On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <ca...@janelia.hhmi.org>
> wrote:
>
>> As one of the evil administrators that runs a RHEL 6 cluster, we already
>> provide quite a few different version of python on our cluster pretty darn
>> easily. All you need is a separate install directory and to set the
>> PYTHON_HOME environment variable to point to the correct python, then have
>> the users make sure the correct python is in their PATH. I understand that
>> other administrators may not be so compliant.
>>
>> Saw a small bit about the java version in there; does Spark currently
>> prefer Java 1.8.x?
>>
>> —Ken
>>
>> On Jan 5, 2016, at 6:08 PM, Josh Rosen <jo...@databricks.com> wrote:
>>
>> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>>> while continuing to use a vanilla `python` executable on the executors
>>
>>
>> Whoops, just to be clear, this should actually read "while continuing to
>> use a vanilla `python` 2.7 executable".
>>
>> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com>
>> wrote:
>>
>>> Yep, the driver and executors need to have compatible Python versions. I
>>> think that there are some bytecode-level incompatibilities between 2.6 and
>>> 2.7 which would impact the deserialization of Python closures, so I think
>>> you need to be running the same 2.x version for all communicating Spark
>>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>>> driver while continuing to use a vanilla `python` executable on the
>>> executors (we have environment variables which allow you to control these
>>> separately).
>>>
>>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> I think all the slaves need the same (or a compatible) version of
>>>> Python installed since they run Python code in PySpark jobs natively.
>>>>
>>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>>
>>>>> interesting i didnt know that!
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>>
>>>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>>
>>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>>>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>>>>> version without making your changes open source. The GPL-compatible
>>>>>> licenses make it possible to combine Python with other software that is
>>>>>> released under the GPL; the others don’t.
>>>>>>
>>>>>> Nick
>>>>>> ​
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> i do not think so.
>>>>>>>
>>>>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>>>>> not have direct access to those.
>>>>>>>
>>>>>>> also, spark is easy for us to ship with our software since its
>>>>>>> apache 2 licensed, and it only needs to be present on the machine that
>>>>>>> launches the app (thanks to yarn).
>>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>>> the app we can not ship it with our software because its gpl licensed, so
>>>>>>> the client would have to download it and install it themselves, and this
>>>>>>> would mean its an independent install which has to be audited and approved
>>>>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <joshrosen@databricks.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then
>>>>>>>> I imagine that they're also capable of installing a standalone Python
>>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> yeah, the practical concern is that we have no control over java
>>>>>>>>> or python version on large company clusters. our current reality for the
>>>>>>>>> vast majority of them is java 7 and python 2.6, no matter how outdated that
>>>>>>>>> is.
>>>>>>>>>
>>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>>
>>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>>>>> customers.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python
>>>>>>>>>> 2.6 until 2020. So I'm assuming these large companies will have the option
>>>>>>>>>> of riding out Python 2.6 until then.
>>>>>>>>>>
>>>>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>>>>> 2.6 for the next several years? Even though the core Python devs stopped
>>>>>>>>>> supporting it in 2013?
>>>>>>>>>>
>>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>>
>>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>>>> support 2.6.
>>>>>>>>>>
>>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>>> with those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>>>
>>>>>>>>>> Nick
>>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>>>
>>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>>
>>>>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>>
>>>>>>>>>>> That said, I believe it should not be a concern for Spark.
>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the Spark
>>>>>>>>>>> philosophy IMO.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>> escribió:
>>>>>>>>>>>
>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>>
>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6
>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>>
>>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python
>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> +1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>>> meethu.mathew@flytxt.com> 写道:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
hey evil admin:)
i think the bit about java was from me?
if so, i meant to indicate that the reality for us is java is 1.7 on most
(all?) clusters. i do not believe spark prefers java 1.8. my point was that
even although java 1.7 is getting old as well it would be a major issue for
me if spark dropped java 1.7 support.

On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <ca...@janelia.hhmi.org>
wrote:

> As one of the evil administrators that runs a RHEL 6 cluster, we already
> provide quite a few different version of python on our cluster pretty darn
> easily. All you need is a separate install directory and to set the
> PYTHON_HOME environment variable to point to the correct python, then have
> the users make sure the correct python is in their PATH. I understand that
> other administrators may not be so compliant.
>
> Saw a small bit about the java version in there; does Spark currently
> prefer Java 1.8.x?
>
> —Ken
>
> On Jan 5, 2016, at 6:08 PM, Josh Rosen <jo...@databricks.com> wrote:
>
> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>> while continuing to use a vanilla `python` executable on the executors
>
>
> Whoops, just to be clear, this should actually read "while continuing to
> use a vanilla `python` 2.7 executable".
>
> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com>
> wrote:
>
>> Yep, the driver and executors need to have compatible Python versions. I
>> think that there are some bytecode-level incompatibilities between 2.6 and
>> 2.7 which would impact the deserialization of Python closures, so I think
>> you need to be running the same 2.x version for all communicating Spark
>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>> driver while continuing to use a vanilla `python` executable on the
>> executors (we have environment variables which allow you to control these
>> separately).
>>
>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> I think all the slaves need the same (or a compatible) version of Python
>>> installed since they run Python code in PySpark jobs natively.
>>>
>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>>> interesting i didnt know that!
>>>>
>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>
>>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>
>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>>>> version without making your changes open source. The GPL-compatible
>>>>> licenses make it possible to combine Python with other software that is
>>>>> released under the GPL; the others don’t.
>>>>>
>>>>> Nick
>>>>> ​
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> i do not think so.
>>>>>>
>>>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>>>> not have direct access to those.
>>>>>>
>>>>>> also, spark is easy for us to ship with our software since its apache
>>>>>> 2 licensed, and it only needs to be present on the machine that launches
>>>>>> the app (thanks to yarn).
>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>> the app we can not ship it with our software because its gpl licensed, so
>>>>>> the client would have to download it and install it themselves, and this
>>>>>> would mean its an independent install which has to be audited and approved
>>>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then
>>>>>>> I imagine that they're also capable of installing a standalone Python
>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> yeah, the practical concern is that we have no control over java or
>>>>>>>> python version on large company clusters. our current reality for the vast
>>>>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>>>>
>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>
>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>>>> customers.
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>>>>> riding out Python 2.6 until then.
>>>>>>>>>
>>>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>>>> 2.6 for the next several years? Even though the core Python devs stopped
>>>>>>>>> supporting it in 2013?
>>>>>>>>>
>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>
>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>>> support 2.6.
>>>>>>>>>
>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>> with those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>>
>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>
>>>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>
>>>>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>>>>> IMO.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>> escribió:
>>>>>>>>>>
>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>
>>>>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>
>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6.
>>>>>>>>>>> At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>>
>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>
>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>
>>>>>>>>>>>> Nick
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>
>>>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>> meethu.mathew@flytxt.com> 写道:
>>>>>>>>>>>>>
>>>>>>>>>>>>> +1
>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>
>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
hey evil admin:)
i think the bit about java was from me?
if so, i meant to indicate that the reality for us is java is 1.7 on most
(all?) clusters. i do not believe spark prefers java 1.8. my point was that
even although java 1.7 is getting old as well it would be a major issue for
me if spark dropped java 1.7 support.

On Tue, Jan 5, 2016 at 6:53 PM, Carlile, Ken <ca...@janelia.hhmi.org>
wrote:

> As one of the evil administrators that runs a RHEL 6 cluster, we already
> provide quite a few different version of python on our cluster pretty darn
> easily. All you need is a separate install directory and to set the
> PYTHON_HOME environment variable to point to the correct python, then have
> the users make sure the correct python is in their PATH. I understand that
> other administrators may not be so compliant.
>
> Saw a small bit about the java version in there; does Spark currently
> prefer Java 1.8.x?
>
> —Ken
>
> On Jan 5, 2016, at 6:08 PM, Josh Rosen <jo...@databricks.com> wrote:
>
> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
>> while continuing to use a vanilla `python` executable on the executors
>
>
> Whoops, just to be clear, this should actually read "while continuing to
> use a vanilla `python` 2.7 executable".
>
> On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com>
> wrote:
>
>> Yep, the driver and executors need to have compatible Python versions. I
>> think that there are some bytecode-level incompatibilities between 2.6 and
>> 2.7 which would impact the deserialization of Python closures, so I think
>> you need to be running the same 2.x version for all communicating Spark
>> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
>> driver while continuing to use a vanilla `python` executable on the
>> executors (we have environment variables which allow you to control these
>> separately).
>>
>> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> I think all the slaves need the same (or a compatible) version of Python
>>> installed since they run Python code in PySpark jobs natively.
>>>
>>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>>> interesting i didnt know that!
>>>>
>>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>> the app we can not ship it with our software because its gpl licensed
>>>>>
>>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>>
>>>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>>>> version without making your changes open source. The GPL-compatible
>>>>> licenses make it possible to combine Python with other software that is
>>>>> released under the GPL; the others don’t.
>>>>>
>>>>> Nick
>>>>> ​
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> i do not think so.
>>>>>>
>>>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>>>> not have direct access to those.
>>>>>>
>>>>>> also, spark is easy for us to ship with our software since its apache
>>>>>> 2 licensed, and it only needs to be present on the machine that launches
>>>>>> the app (thanks to yarn).
>>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>>> the app we can not ship it with our software because its gpl licensed, so
>>>>>> the client would have to download it and install it themselves, and this
>>>>>> would mean its an independent install which has to be audited and approved
>>>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then
>>>>>>> I imagine that they're also capable of installing a standalone Python
>>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> yeah, the practical concern is that we have no control over java or
>>>>>>>> python version on large company clusters. our current reality for the vast
>>>>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>>>>
>>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>>
>>>>>>>> we currently don't use pyspark so i have no stake in this, but if
>>>>>>>> we did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>>>> customers.
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>>>>> riding out Python 2.6 until then.
>>>>>>>>>
>>>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>>>> 2.6 for the next several years? Even though the core Python devs stopped
>>>>>>>>> supporting it in 2013?
>>>>>>>>>
>>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>>> drop support? What are the criteria?
>>>>>>>>>
>>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>>> support 2.6.
>>>>>>>>>
>>>>>>>>> I suppose if our main PySpark contributors are fine putting up
>>>>>>>>> with those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>>
>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>
>>>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>
>>>>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>>>>> IMO.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>> escribió:
>>>>>>>>>>
>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>
>>>>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>
>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6.
>>>>>>>>>>> At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>>
>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>
>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>
>>>>>>>>>>>> Nick
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>
>>>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>> meethu.mathew@flytxt.com> 写道:
>>>>>>>>>>>>>
>>>>>>>>>>>>> +1
>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>
>

Re: [discuss] dropping Python 2.6 support

Posted by Josh Rosen <jo...@databricks.com>.
>
> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
> while continuing to use a vanilla `python` executable on the executors


Whoops, just to be clear, this should actually read "while continuing to
use a vanilla `python` 2.7 executable".

On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com> wrote:

> Yep, the driver and executors need to have compatible Python versions. I
> think that there are some bytecode-level incompatibilities between 2.6 and
> 2.7 which would impact the deserialization of Python closures, so I think
> you need to be running the same 2.x version for all communicating Spark
> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
> driver while continuing to use a vanilla `python` executable on the
> executors (we have environment variables which allow you to control these
> separately).
>
> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> I think all the slaves need the same (or a compatible) version of Python
>> installed since they run Python code in PySpark jobs natively.
>>
>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> interesting i didnt know that!
>>>
>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> even if python 2.7 was needed only on this one machine that launches
>>>> the app we can not ship it with our software because its gpl licensed
>>>>
>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>
>>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>>> version without making your changes open source. The GPL-compatible
>>>> licenses make it possible to combine Python with other software that is
>>>> released under the GPL; the others don’t.
>>>>
>>>> Nick
>>>> ​
>>>>
>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>>
>>>>> i do not think so.
>>>>>
>>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>>> not have direct access to those.
>>>>>
>>>>> also, spark is easy for us to ship with our software since its apache
>>>>> 2 licensed, and it only needs to be present on the machine that launches
>>>>> the app (thanks to yarn).
>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>> the app we can not ship it with our software because its gpl licensed, so
>>>>> the client would have to download it and install it themselves, and this
>>>>> would mean its an independent install which has to be audited and approved
>>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>>>>> imagine that they're also capable of installing a standalone Python
>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> yeah, the practical concern is that we have no control over java or
>>>>>>> python version on large company clusters. our current reality for the vast
>>>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>>>
>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>
>>>>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>>> customers.
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>
>>>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>>>> riding out Python 2.6 until then.
>>>>>>>>
>>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>>> 2.6 for the next several years? Even though the core Python devs stopped
>>>>>>>> supporting it in 2013?
>>>>>>>>
>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>> drop support? What are the criteria?
>>>>>>>>
>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>> support 2.6.
>>>>>>>>
>>>>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>
>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>
>>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>
>>>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>>>> IMO.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>> escribió:
>>>>>>>>>
>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>
>>>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>>>
>>>>>>>>> so i think its a bad idea
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6.
>>>>>>>>>> At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>
>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>
>>>>>>>>>>> Nick
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>
>>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>>>>> 写道:
>>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Josh Rosen <jo...@databricks.com>.
>
> Note that you _can_ use a Python 2.7 `ipython` executable on the driver
> while continuing to use a vanilla `python` executable on the executors


Whoops, just to be clear, this should actually read "while continuing to
use a vanilla `python` 2.7 executable".

On Tue, Jan 5, 2016 at 3:07 PM, Josh Rosen <jo...@databricks.com> wrote:

> Yep, the driver and executors need to have compatible Python versions. I
> think that there are some bytecode-level incompatibilities between 2.6 and
> 2.7 which would impact the deserialization of Python closures, so I think
> you need to be running the same 2.x version for all communicating Spark
> processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
> driver while continuing to use a vanilla `python` executable on the
> executors (we have environment variables which allow you to control these
> separately).
>
> On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> I think all the slaves need the same (or a compatible) version of Python
>> installed since they run Python code in PySpark jobs natively.
>>
>> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> interesting i didnt know that!
>>>
>>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> even if python 2.7 was needed only on this one machine that launches
>>>> the app we can not ship it with our software because its gpl licensed
>>>>
>>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>>> but not GPL <https://docs.python.org/3/license.html>:
>>>>
>>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>>> version without making your changes open source. The GPL-compatible
>>>> licenses make it possible to combine Python with other software that is
>>>> released under the GPL; the others don’t.
>>>>
>>>> Nick
>>>> ​
>>>>
>>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>>
>>>>> i do not think so.
>>>>>
>>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>>> not have direct access to those.
>>>>>
>>>>> also, spark is easy for us to ship with our software since its apache
>>>>> 2 licensed, and it only needs to be present on the machine that launches
>>>>> the app (thanks to yarn).
>>>>> even if python 2.7 was needed only on this one machine that launches
>>>>> the app we can not ship it with our software because its gpl licensed, so
>>>>> the client would have to download it and install it themselves, and this
>>>>> would mean its an independent install which has to be audited and approved
>>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>>
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>>>>> imagine that they're also capable of installing a standalone Python
>>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>>> require any special permissions to install (you don't need root / sudo
>>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>>> wrote:
>>>>>>
>>>>>>> yeah, the practical concern is that we have no control over java or
>>>>>>> python version on large company clusters. our current reality for the vast
>>>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>>>
>>>>>>> i dont like it either, but i cannot change it.
>>>>>>>
>>>>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>>> customers.
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>
>>>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>>>> riding out Python 2.6 until then.
>>>>>>>>
>>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>>> 2.6 for the next several years? Even though the core Python devs stopped
>>>>>>>> supporting it in 2013?
>>>>>>>>
>>>>>>>> If that's not what we're suggesting, then when, roughly, can we
>>>>>>>> drop support? What are the criteria?
>>>>>>>>
>>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>>> support 2.6.
>>>>>>>>
>>>>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>>
>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>
>>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>
>>>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>>>> IMO.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>> escribió:
>>>>>>>>>
>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>
>>>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>>>
>>>>>>>>> so i think its a bad idea
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6.
>>>>>>>>>> At this point, Python 3 should be the default that is encouraged.
>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>
>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>
>>>>>>>>>>> Nick
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>> allenzhang010@126.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>
>>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>>>>> 写道:
>>>>>>>>>>>>
>>>>>>>>>>>> +1
>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some libraries that
>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Josh Rosen <jo...@databricks.com>.
Yep, the driver and executors need to have compatible Python versions. I
think that there are some bytecode-level incompatibilities between 2.6 and
2.7 which would impact the deserialization of Python closures, so I think
you need to be running the same 2.x version for all communicating Spark
processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
driver while continuing to use a vanilla `python` executable on the
executors (we have environment variables which allow you to control these
separately).

On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> I think all the slaves need the same (or a compatible) version of Python
> installed since they run Python code in PySpark jobs natively.
>
> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>
>> interesting i didnt know that!
>>
>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed
>>>
>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>> but not GPL <https://docs.python.org/3/license.html>:
>>>
>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>> version without making your changes open source. The GPL-compatible
>>> licenses make it possible to combine Python with other software that is
>>> released under the GPL; the others don’t.
>>>
>>> Nick
>>> ​
>>>
>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>>> i do not think so.
>>>>
>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>> not have direct access to those.
>>>>
>>>> also, spark is easy for us to ship with our software since its apache 2
>>>> licensed, and it only needs to be present on the machine that launches the
>>>> app (thanks to yarn).
>>>> even if python 2.7 was needed only on this one machine that launches
>>>> the app we can not ship it with our software because its gpl licensed, so
>>>> the client would have to download it and install it themselves, and this
>>>> would mean its an independent install which has to be audited and approved
>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>>> wrote:
>>>>
>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>>>> imagine that they're also capable of installing a standalone Python
>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>> require any special permissions to install (you don't need root / sudo
>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>
>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> yeah, the practical concern is that we have no control over java or
>>>>>> python version on large company clusters. our current reality for the vast
>>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>>
>>>>>> i dont like it either, but i cannot change it.
>>>>>>
>>>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>> customers.
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>
>>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>>> riding out Python 2.6 until then.
>>>>>>>
>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>> 2.6 for the next several years? Even though the core Python devs stopped
>>>>>>> supporting it in 2013?
>>>>>>>
>>>>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>>>>> support? What are the criteria?
>>>>>>>
>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>> support 2.6.
>>>>>>>
>>>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>
>>>>>>> Nick
>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>
>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>
>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>
>>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>>> IMO.
>>>>>>>>
>>>>>>>>
>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>> escribió:
>>>>>>>>
>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>
>>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>>
>>>>>>>> so i think its a bad idea
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6.
>>>>>>>>> At this point, Python 3 should be the default that is encouraged.
>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>
>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>> believe we currently do).
>>>>>>>>>>
>>>>>>>>>> Nick
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> plus 1,
>>>>>>>>>>>
>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>>>> 写道:
>>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Does anybody here care about us dropping support for Python 2.6
>>>>>>>>>>>> in Spark 2.0?
>>>>>>>>>>>>
>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>

Re: [discuss] dropping Python 2.6 support

Posted by Josh Rosen <jo...@databricks.com>.
Yep, the driver and executors need to have compatible Python versions. I
think that there are some bytecode-level incompatibilities between 2.6 and
2.7 which would impact the deserialization of Python closures, so I think
you need to be running the same 2.x version for all communicating Spark
processes. Note that you _can_ use a Python 2.7 `ipython` executable on the
driver while continuing to use a vanilla `python` executable on the
executors (we have environment variables which allow you to control these
separately).

On Tue, Jan 5, 2016 at 3:05 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> I think all the slaves need the same (or a compatible) version of Python
> installed since they run Python code in PySpark jobs natively.
>
> On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:
>
>> interesting i didnt know that!
>>
>> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed
>>>
>>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>>> but not GPL <https://docs.python.org/3/license.html>:
>>>
>>> Note GPL-compatible doesn’t mean that we’re distributing Python under
>>> the GPL. All Python licenses, unlike the GPL, let you distribute a modified
>>> version without making your changes open source. The GPL-compatible
>>> licenses make it possible to combine Python with other software that is
>>> released under the GPL; the others don’t.
>>>
>>> Nick
>>> ​
>>>
>>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>>> i do not think so.
>>>>
>>>> does the python 2.7 need to be installed on all slaves? if so, we do
>>>> not have direct access to those.
>>>>
>>>> also, spark is easy for us to ship with our software since its apache 2
>>>> licensed, and it only needs to be present on the machine that launches the
>>>> app (thanks to yarn).
>>>> even if python 2.7 was needed only on this one machine that launches
>>>> the app we can not ship it with our software because its gpl licensed, so
>>>> the client would have to download it and install it themselves, and this
>>>> would mean its an independent install which has to be audited and approved
>>>> and now you are in for a lot of fun. basically it will never happen.
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>>> wrote:
>>>>
>>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>>>> imagine that they're also capable of installing a standalone Python
>>>>> alongside that Spark version (without changing Python systemwide). For
>>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>>> require any special permissions to install (you don't need root / sudo
>>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>>
>>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>>> wrote:
>>>>>
>>>>>> yeah, the practical concern is that we have no control over java or
>>>>>> python version on large company clusters. our current reality for the vast
>>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>>
>>>>>> i dont like it either, but i cannot change it.
>>>>>>
>>>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>>> customers.
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>
>>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>>> riding out Python 2.6 until then.
>>>>>>>
>>>>>>> Are we seriously saying that Spark should likewise support Python
>>>>>>> 2.6 for the next several years? Even though the core Python devs stopped
>>>>>>> supporting it in 2013?
>>>>>>>
>>>>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>>>>> support? What are the criteria?
>>>>>>>
>>>>>>> I understand the practical concern here. If companies are stuck
>>>>>>> using 2.6, it doesn't matter to them that it is deprecated. But balancing
>>>>>>> that concern against the maintenance burden on this project, I would say
>>>>>>> that "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable
>>>>>>> position to take. There are many tiny annoyances one has to put up with to
>>>>>>> support 2.6.
>>>>>>>
>>>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>>>
>>>>>>> Nick
>>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>> julio@esbet.es>님이 작성:
>>>>>>>
>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>
>>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>
>>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>>> IMO.
>>>>>>>>
>>>>>>>>
>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>> escribió:
>>>>>>>>
>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>
>>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>>
>>>>>>>> so i think its a bad idea
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6.
>>>>>>>>> At this point, Python 3 should be the default that is encouraged.
>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>
>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>> believe we currently do).
>>>>>>>>>>
>>>>>>>>>> Nick
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> plus 1,
>>>>>>>>>>>
>>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>>>> 写道:
>>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>> rxin@databricks.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Does anybody here care about us dropping support for Python 2.6
>>>>>>>>>>>> in Spark 2.0?
>>>>>>>>>>>>
>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>

Re: [discuss] dropping Python 2.6 support

Posted by Nicholas Chammas <ni...@gmail.com>.
I think all the slaves need the same (or a compatible) version of Python
installed since they run Python code in PySpark jobs natively.

On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:

> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed
>>
>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>> but not GPL <https://docs.python.org/3/license.html>:
>>
>> Note GPL-compatible doesn’t mean that we’re distributing Python under the
>> GPL. All Python licenses, unlike the GPL, let you distribute a modified
>> version without making your changes open source. The GPL-compatible
>> licenses make it possible to combine Python with other software that is
>> released under the GPL; the others don’t.
>>
>> Nick
>> ​
>>
>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i do not think so.
>>>
>>> does the python 2.7 need to be installed on all slaves? if so, we do not
>>> have direct access to those.
>>>
>>> also, spark is easy for us to ship with our software since its apache 2
>>> licensed, and it only needs to be present on the machine that launches the
>>> app (thanks to yarn).
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed, so the
>>> client would have to download it and install it themselves, and this would
>>> mean its an independent install which has to be audited and approved and
>>> now you are in for a lot of fun. basically it will never happen.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>> wrote:
>>>
>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>>> imagine that they're also capable of installing a standalone Python
>>>> alongside that Spark version (without changing Python systemwide). For
>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>> require any special permissions to install (you don't need root / sudo
>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>
>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> yeah, the practical concern is that we have no control over java or
>>>>> python version on large company clusters. our current reality for the vast
>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>
>>>>> i dont like it either, but i cannot change it.
>>>>>
>>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>> customers.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>> riding out Python 2.6 until then.
>>>>>>
>>>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>>>> for the next several years? Even though the core Python devs stopped
>>>>>> supporting it in 2013?
>>>>>>
>>>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>>>> support? What are the criteria?
>>>>>>
>>>>>> I understand the practical concern here. If companies are stuck using
>>>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>>>> concern against the maintenance burden on this project, I would say that
>>>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>>>
>>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>>
>>>>>> Nick
>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>> julio@esbet.es>님이 작성:
>>>>>>
>>>>>>> Unfortunately, Koert is right.
>>>>>>>
>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>
>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>> IMO.
>>>>>>>
>>>>>>>
>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>> escribió:
>>>>>>>
>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>
>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>
>>>>>>> so i think its a bad idea
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>
>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>> support sounds very reasonable to me.
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>
>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>> believe we currently do).
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> plus 1,
>>>>>>>>>>
>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>>> 写道:
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>> We use Python 2.7
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Meethu Mathew
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rxin@databricks.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Does anybody here care about us dropping support for Python 2.6
>>>>>>>>>>> in Spark 2.0?
>>>>>>>>>>>
>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Nicholas Chammas <ni...@gmail.com>.
I think all the slaves need the same (or a compatible) version of Python
installed since they run Python code in PySpark jobs natively.

On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote:

> interesting i didnt know that!
>
> On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed
>>
>> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
>> but not GPL <https://docs.python.org/3/license.html>:
>>
>> Note GPL-compatible doesn’t mean that we’re distributing Python under the
>> GPL. All Python licenses, unlike the GPL, let you distribute a modified
>> version without making your changes open source. The GPL-compatible
>> licenses make it possible to combine Python with other software that is
>> released under the GPL; the others don’t.
>>
>> Nick
>> ​
>>
>> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i do not think so.
>>>
>>> does the python 2.7 need to be installed on all slaves? if so, we do not
>>> have direct access to those.
>>>
>>> also, spark is easy for us to ship with our software since its apache 2
>>> licensed, and it only needs to be present on the machine that launches the
>>> app (thanks to yarn).
>>> even if python 2.7 was needed only on this one machine that launches the
>>> app we can not ship it with our software because its gpl licensed, so the
>>> client would have to download it and install it themselves, and this would
>>> mean its an independent install which has to be audited and approved and
>>> now you are in for a lot of fun. basically it will never happen.
>>>
>>>
>>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>>> wrote:
>>>
>>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>>> imagine that they're also capable of installing a standalone Python
>>>> alongside that Spark version (without changing Python systemwide). For
>>>> instance, Anaconda/Miniconda make it really easy to install Python
>>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>>> require any special permissions to install (you don't need root / sudo
>>>> access). Does this address the Python versioning concerns for RHEL users?
>>>>
>>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>
>>>>> yeah, the practical concern is that we have no control over java or
>>>>> python version on large company clusters. our current reality for the vast
>>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>>
>>>>> i dont like it either, but i cannot change it.
>>>>>
>>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>>> dropped. no point in developing something that doesnt run for majority of
>>>>> customers.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>>> riding out Python 2.6 until then.
>>>>>>
>>>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>>>> for the next several years? Even though the core Python devs stopped
>>>>>> supporting it in 2013?
>>>>>>
>>>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>>>> support? What are the criteria?
>>>>>>
>>>>>> I understand the practical concern here. If companies are stuck using
>>>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>>>> concern against the maintenance burden on this project, I would say that
>>>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>>>
>>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>>
>>>>>> Nick
>>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>> julio@esbet.es>님이 작성:
>>>>>>
>>>>>>> Unfortunately, Koert is right.
>>>>>>>
>>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>>
>>>>>>> That said, I believe it should not be a concern for Spark. Python
>>>>>>> 2.6 is old and busted, which is totally opposite to the Spark philosophy
>>>>>>> IMO.
>>>>>>>
>>>>>>>
>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>> escribió:
>>>>>>>
>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>
>>>>>>> if so, i still know plenty of large companies where python 2.6 is
>>>>>>> the only option. asking them for python 2.7 is not going to work
>>>>>>>
>>>>>>> so i think its a bad idea
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>>
>>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>> behind the version they should theoretically use. Dropping python 2.6
>>>>>>>> support sounds very reasonable to me.
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>
>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>> believe we currently do).
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> plus 1,
>>>>>>>>>>
>>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>>> 写道:
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>> We use Python 2.7
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Meethu Mathew
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rxin@databricks.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Does anybody here care about us dropping support for Python 2.6
>>>>>>>>>>> in Spark 2.0?
>>>>>>>>>>>
>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
interesting i didnt know that!

On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> even if python 2.7 was needed only on this one machine that launches the
> app we can not ship it with our software because its gpl licensed
>
> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
> but not GPL <https://docs.python.org/3/license.html>:
>
> Note GPL-compatible doesn’t mean that we’re distributing Python under the
> GPL. All Python licenses, unlike the GPL, let you distribute a modified
> version without making your changes open source. The GPL-compatible
> licenses make it possible to combine Python with other software that is
> released under the GPL; the others don’t.
>
> Nick
> ​
>
> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>
>> i do not think so.
>>
>> does the python 2.7 need to be installed on all slaves? if so, we do not
>> have direct access to those.
>>
>> also, spark is easy for us to ship with our software since its apache 2
>> licensed, and it only needs to be present on the machine that launches the
>> app (thanks to yarn).
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed, so the
>> client would have to download it and install it themselves, and this would
>> mean its an independent install which has to be audited and approved and
>> now you are in for a lot of fun. basically it will never happen.
>>
>>
>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>> wrote:
>>
>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>> imagine that they're also capable of installing a standalone Python
>>> alongside that Spark version (without changing Python systemwide). For
>>> instance, Anaconda/Miniconda make it really easy to install Python
>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>> require any special permissions to install (you don't need root / sudo
>>> access). Does this address the Python versioning concerns for RHEL users?
>>>
>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>>> yeah, the practical concern is that we have no control over java or
>>>> python version on large company clusters. our current reality for the vast
>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>
>>>> i dont like it either, but i cannot change it.
>>>>
>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>> dropped. no point in developing something that doesnt run for majority of
>>>> customers.
>>>>
>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>> riding out Python 2.6 until then.
>>>>>
>>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>>> for the next several years? Even though the core Python devs stopped
>>>>> supporting it in 2013?
>>>>>
>>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>>> support? What are the criteria?
>>>>>
>>>>> I understand the practical concern here. If companies are stuck using
>>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>>> concern against the maintenance burden on this project, I would say that
>>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>>
>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>
>>>>> Nick
>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
>>>>> 작성:
>>>>>
>>>>>> Unfortunately, Koert is right.
>>>>>>
>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>
>>>>>> That said, I believe it should not be a concern for Spark. Python 2.6
>>>>>> is old and busted, which is totally opposite to the Spark philosophy IMO.
>>>>>>
>>>>>>
>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>> escribió:
>>>>>>
>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>
>>>>>> if so, i still know plenty of large companies where python 2.6 is the
>>>>>> only option. asking them for python 2.7 is not going to work
>>>>>>
>>>>>> so i think its a bad idea
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>
>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>>>>> the version they should theoretically use. Dropping python 2.6
>>>>>>> support sounds very reasonable to me.
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>
>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>> believe we currently do).
>>>>>>>>
>>>>>>>> Nick
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> plus 1,
>>>>>>>>>
>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>> 写道:
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>> We use Python 2.7
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Meethu Mathew
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Does anybody here care about us dropping support for Python 2.6
>>>>>>>>>> in Spark 2.0?
>>>>>>>>>>
>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
interesting i didnt know that!

On Tue, Jan 5, 2016 at 5:57 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> even if python 2.7 was needed only on this one machine that launches the
> app we can not ship it with our software because its gpl licensed
>
> Not to nitpick, but maybe this is important. The Python license is GPL-compatible
> but not GPL <https://docs.python.org/3/license.html>:
>
> Note GPL-compatible doesn’t mean that we’re distributing Python under the
> GPL. All Python licenses, unlike the GPL, let you distribute a modified
> version without making your changes open source. The GPL-compatible
> licenses make it possible to combine Python with other software that is
> released under the GPL; the others don’t.
>
> Nick
> ​
>
> On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>
>> i do not think so.
>>
>> does the python 2.7 need to be installed on all slaves? if so, we do not
>> have direct access to those.
>>
>> also, spark is easy for us to ship with our software since its apache 2
>> licensed, and it only needs to be present on the machine that launches the
>> app (thanks to yarn).
>> even if python 2.7 was needed only on this one machine that launches the
>> app we can not ship it with our software because its gpl licensed, so the
>> client would have to download it and install it themselves, and this would
>> mean its an independent install which has to be audited and approved and
>> now you are in for a lot of fun. basically it will never happen.
>>
>>
>> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
>> wrote:
>>
>>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>>> imagine that they're also capable of installing a standalone Python
>>> alongside that Spark version (without changing Python systemwide). For
>>> instance, Anaconda/Miniconda make it really easy to install Python
>>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>>> require any special permissions to install (you don't need root / sudo
>>> access). Does this address the Python versioning concerns for RHEL users?
>>>
>>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>>> yeah, the practical concern is that we have no control over java or
>>>> python version on large company clusters. our current reality for the vast
>>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>>
>>>> i dont like it either, but i cannot change it.
>>>>
>>>> we currently don't use pyspark so i have no stake in this, but if we
>>>> did i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>>> dropped. no point in developing something that doesnt run for majority of
>>>> customers.
>>>>
>>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>>> until 2020. So I'm assuming these large companies will have the option of
>>>>> riding out Python 2.6 until then.
>>>>>
>>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>>> for the next several years? Even though the core Python devs stopped
>>>>> supporting it in 2013?
>>>>>
>>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>>> support? What are the criteria?
>>>>>
>>>>> I understand the practical concern here. If companies are stuck using
>>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>>> concern against the maintenance burden on this project, I would say that
>>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>>
>>>>> I suppose if our main PySpark contributors are fine putting up with
>>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>>
>>>>> Nick
>>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
>>>>> 작성:
>>>>>
>>>>>> Unfortunately, Koert is right.
>>>>>>
>>>>>> I've been in a couple of projects using Spark (banking industry)
>>>>>> where CentOS + Python 2.6 is the toolbox available.
>>>>>>
>>>>>> That said, I believe it should not be a concern for Spark. Python 2.6
>>>>>> is old and busted, which is totally opposite to the Spark philosophy IMO.
>>>>>>
>>>>>>
>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>> escribió:
>>>>>>
>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>
>>>>>> if so, i still know plenty of large companies where python 2.6 is the
>>>>>> only option. asking them for python 2.7 is not going to work
>>>>>>
>>>>>> so i think its a bad idea
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>> juliet.hougland@gmail.com> wrote:
>>>>>>
>>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>>>>> the version they should theoretically use. Dropping python 2.6
>>>>>>> support sounds very reasonable to me.
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>
>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>> believe we currently do).
>>>>>>>>
>>>>>>>> Nick
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> plus 1,
>>>>>>>>>
>>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>>> 写道:
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>> We use Python 2.7
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Meethu Mathew
>>>>>>>>>
>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Does anybody here care about us dropping support for Python 2.6
>>>>>>>>>> in Spark 2.0?
>>>>>>>>>>
>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>

Re: [discuss] dropping Python 2.6 support

Posted by Nicholas Chammas <ni...@gmail.com>.
even if python 2.7 was needed only on this one machine that launches the
app we can not ship it with our software because its gpl licensed

Not to nitpick, but maybe this is important. The Python license is
GPL-compatible
but not GPL <https://docs.python.org/3/license.html>:

Note GPL-compatible doesn’t mean that we’re distributing Python under the
GPL. All Python licenses, unlike the GPL, let you distribute a modified
version without making your changes open source. The GPL-compatible
licenses make it possible to combine Python with other software that is
released under the GPL; the others don’t.

Nick
​

On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:

> i do not think so.
>
> does the python 2.7 need to be installed on all slaves? if so, we do not
> have direct access to those.
>
> also, spark is easy for us to ship with our software since its apache 2
> licensed, and it only needs to be present on the machine that launches the
> app (thanks to yarn).
> even if python 2.7 was needed only on this one machine that launches the
> app we can not ship it with our software because its gpl licensed, so the
> client would have to download it and install it themselves, and this would
> mean its an independent install which has to be audited and approved and
> now you are in for a lot of fun. basically it will never happen.
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
> wrote:
>
>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>> imagine that they're also capable of installing a standalone Python
>> alongside that Spark version (without changing Python systemwide). For
>> instance, Anaconda/Miniconda make it really easy to install Python
>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>> require any special permissions to install (you don't need root / sudo
>> access). Does this address the Python versioning concerns for RHEL users?
>>
>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> yeah, the practical concern is that we have no control over java or
>>> python version on large company clusters. our current reality for the vast
>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>
>>> i dont like it either, but i cannot change it.
>>>
>>> we currently don't use pyspark so i have no stake in this, but if we did
>>> i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>> dropped. no point in developing something that doesnt run for majority of
>>> customers.
>>>
>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>> until 2020. So I'm assuming these large companies will have the option of
>>>> riding out Python 2.6 until then.
>>>>
>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>> for the next several years? Even though the core Python devs stopped
>>>> supporting it in 2013?
>>>>
>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>> support? What are the criteria?
>>>>
>>>> I understand the practical concern here. If companies are stuck using
>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>> concern against the maintenance burden on this project, I would say that
>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>
>>>> I suppose if our main PySpark contributors are fine putting up with
>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>
>>>> Nick
>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
>>>> 작성:
>>>>
>>>>> Unfortunately, Koert is right.
>>>>>
>>>>> I've been in a couple of projects using Spark (banking industry) where
>>>>> CentOS + Python 2.6 is the toolbox available.
>>>>>
>>>>> That said, I believe it should not be a concern for Spark. Python 2.6
>>>>> is old and busted, which is totally opposite to the Spark philosophy IMO.
>>>>>
>>>>>
>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>> escribió:
>>>>>
>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>
>>>>> if so, i still know plenty of large companies where python 2.6 is the
>>>>> only option. asking them for python 2.7 is not going to work
>>>>>
>>>>> so i think its a bad idea
>>>>>
>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>> juliet.hougland@gmail.com> wrote:
>>>>>
>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>>>> the version they should theoretically use. Dropping python 2.6
>>>>>> support sounds very reasonable to me.
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>
>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>>>>>>> we currently do).
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> plus 1,
>>>>>>>>
>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>> 写道:
>>>>>>>>
>>>>>>>> +1
>>>>>>>> We use Python 2.7
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Meethu Mathew
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>>>>> Spark 2.0?
>>>>>>>>>
>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Nicholas Chammas <ni...@gmail.com>.
even if python 2.7 was needed only on this one machine that launches the
app we can not ship it with our software because its gpl licensed

Not to nitpick, but maybe this is important. The Python license is
GPL-compatible
but not GPL <https://docs.python.org/3/license.html>:

Note GPL-compatible doesn’t mean that we’re distributing Python under the
GPL. All Python licenses, unlike the GPL, let you distribute a modified
version without making your changes open source. The GPL-compatible
licenses make it possible to combine Python with other software that is
released under the GPL; the others don’t.

Nick
​

On Tue, Jan 5, 2016 at 5:49 PM Koert Kuipers <ko...@tresata.com> wrote:

> i do not think so.
>
> does the python 2.7 need to be installed on all slaves? if so, we do not
> have direct access to those.
>
> also, spark is easy for us to ship with our software since its apache 2
> licensed, and it only needs to be present on the machine that launches the
> app (thanks to yarn).
> even if python 2.7 was needed only on this one machine that launches the
> app we can not ship it with our software because its gpl licensed, so the
> client would have to download it and install it themselves, and this would
> mean its an independent install which has to be audited and approved and
> now you are in for a lot of fun. basically it will never happen.
>
>
> On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com>
> wrote:
>
>> If users are able to install Spark 2.0 on their RHEL clusters, then I
>> imagine that they're also capable of installing a standalone Python
>> alongside that Spark version (without changing Python systemwide). For
>> instance, Anaconda/Miniconda make it really easy to install Python
>> 2.7.x/3.x without impacting / changing the system Python and doesn't
>> require any special permissions to install (you don't need root / sudo
>> access). Does this address the Python versioning concerns for RHEL users?
>>
>> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> yeah, the practical concern is that we have no control over java or
>>> python version on large company clusters. our current reality for the vast
>>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>>
>>> i dont like it either, but i cannot change it.
>>>
>>> we currently don't use pyspark so i have no stake in this, but if we did
>>> i can assure you we would not upgrade to spark 2.x if python 2.6 was
>>> dropped. no point in developing something that doesnt run for majority of
>>> customers.
>>>
>>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> As I pointed out in my earlier email, RHEL will support Python 2.6
>>>> until 2020. So I'm assuming these large companies will have the option of
>>>> riding out Python 2.6 until then.
>>>>
>>>> Are we seriously saying that Spark should likewise support Python 2.6
>>>> for the next several years? Even though the core Python devs stopped
>>>> supporting it in 2013?
>>>>
>>>> If that's not what we're suggesting, then when, roughly, can we drop
>>>> support? What are the criteria?
>>>>
>>>> I understand the practical concern here. If companies are stuck using
>>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>>> concern against the maintenance burden on this project, I would say that
>>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>>
>>>> I suppose if our main PySpark contributors are fine putting up with
>>>> those annoyances, then maybe we don't need to drop support just yet...
>>>>
>>>> Nick
>>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
>>>> 작성:
>>>>
>>>>> Unfortunately, Koert is right.
>>>>>
>>>>> I've been in a couple of projects using Spark (banking industry) where
>>>>> CentOS + Python 2.6 is the toolbox available.
>>>>>
>>>>> That said, I believe it should not be a concern for Spark. Python 2.6
>>>>> is old and busted, which is totally opposite to the Spark philosophy IMO.
>>>>>
>>>>>
>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>> escribió:
>>>>>
>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>
>>>>> if so, i still know plenty of large companies where python 2.6 is the
>>>>> only option. asking them for python 2.7 is not going to work
>>>>>
>>>>> so i think its a bad idea
>>>>>
>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>> juliet.hougland@gmail.com> wrote:
>>>>>
>>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>>> this point, Python 3 should be the default that is encouraged.
>>>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>>>> the version they should theoretically use. Dropping python 2.6
>>>>>> support sounds very reasonable to me.
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>
>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>>>>>>> we currently do).
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> plus 1,
>>>>>>>>
>>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com>
>>>>>>>> 写道:
>>>>>>>>
>>>>>>>> +1
>>>>>>>> We use Python 2.7
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Meethu Mathew
>>>>>>>>
>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>>>>> Spark 2.0?
>>>>>>>>>
>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
i do not think so.

does the python 2.7 need to be installed on all slaves? if so, we do not
have direct access to those.

also, spark is easy for us to ship with our software since its apache 2
licensed, and it only needs to be present on the machine that launches the
app (thanks to yarn).
even if python 2.7 was needed only on this one machine that launches the
app we can not ship it with our software because its gpl licensed, so the
client would have to download it and install it themselves, and this would
mean its an independent install which has to be audited and approved and
now you are in for a lot of fun. basically it will never happen.


On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com> wrote:

> If users are able to install Spark 2.0 on their RHEL clusters, then I
> imagine that they're also capable of installing a standalone Python
> alongside that Spark version (without changing Python systemwide). For
> instance, Anaconda/Miniconda make it really easy to install Python
> 2.7.x/3.x without impacting / changing the system Python and doesn't
> require any special permissions to install (you don't need root / sudo
> access). Does this address the Python versioning concerns for RHEL users?
>
> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> yeah, the practical concern is that we have no control over java or
>> python version on large company clusters. our current reality for the vast
>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>
>> i dont like it either, but i cannot change it.
>>
>> we currently don't use pyspark so i have no stake in this, but if we did
>> i can assure you we would not upgrade to spark 2.x if python 2.6 was
>> dropped. no point in developing something that doesnt run for majority of
>> customers.
>>
>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> As I pointed out in my earlier email, RHEL will support Python 2.6 until
>>> 2020. So I'm assuming these large companies will have the option of riding
>>> out Python 2.6 until then.
>>>
>>> Are we seriously saying that Spark should likewise support Python 2.6
>>> for the next several years? Even though the core Python devs stopped
>>> supporting it in 2013?
>>>
>>> If that's not what we're suggesting, then when, roughly, can we drop
>>> support? What are the criteria?
>>>
>>> I understand the practical concern here. If companies are stuck using
>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>> concern against the maintenance burden on this project, I would say that
>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>
>>> I suppose if our main PySpark contributors are fine putting up with
>>> those annoyances, then maybe we don't need to drop support just yet...
>>>
>>> Nick
>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
>>> 작성:
>>>
>>>> Unfortunately, Koert is right.
>>>>
>>>> I've been in a couple of projects using Spark (banking industry) where
>>>> CentOS + Python 2.6 is the toolbox available.
>>>>
>>>> That said, I believe it should not be a concern for Spark. Python 2.6
>>>> is old and busted, which is totally opposite to the Spark philosophy IMO.
>>>>
>>>>
>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>>>>
>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>
>>>> if so, i still know plenty of large companies where python 2.6 is the
>>>> only option. asking them for python 2.7 is not going to work
>>>>
>>>> so i think its a bad idea
>>>>
>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>> juliet.hougland@gmail.com> wrote:
>>>>
>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>> this point, Python 3 should be the default that is encouraged.
>>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>>> the version they should theoretically use. Dropping python 2.6
>>>>> support sounds very reasonable to me.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>
>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>>>>>> we currently do).
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>> wrote:
>>>>>>
>>>>>>> plus 1,
>>>>>>>
>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>>>>
>>>>>>> +1
>>>>>>> We use Python 2.7
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Meethu Mathew
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>>>> Spark 2.0?
>>>>>>>>
>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
i do not think so.

does the python 2.7 need to be installed on all slaves? if so, we do not
have direct access to those.

also, spark is easy for us to ship with our software since its apache 2
licensed, and it only needs to be present on the machine that launches the
app (thanks to yarn).
even if python 2.7 was needed only on this one machine that launches the
app we can not ship it with our software because its gpl licensed, so the
client would have to download it and install it themselves, and this would
mean its an independent install which has to be audited and approved and
now you are in for a lot of fun. basically it will never happen.


On Tue, Jan 5, 2016 at 5:35 PM, Josh Rosen <jo...@databricks.com> wrote:

> If users are able to install Spark 2.0 on their RHEL clusters, then I
> imagine that they're also capable of installing a standalone Python
> alongside that Spark version (without changing Python systemwide). For
> instance, Anaconda/Miniconda make it really easy to install Python
> 2.7.x/3.x without impacting / changing the system Python and doesn't
> require any special permissions to install (you don't need root / sudo
> access). Does this address the Python versioning concerns for RHEL users?
>
> On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> yeah, the practical concern is that we have no control over java or
>> python version on large company clusters. our current reality for the vast
>> majority of them is java 7 and python 2.6, no matter how outdated that is.
>>
>> i dont like it either, but i cannot change it.
>>
>> we currently don't use pyspark so i have no stake in this, but if we did
>> i can assure you we would not upgrade to spark 2.x if python 2.6 was
>> dropped. no point in developing something that doesnt run for majority of
>> customers.
>>
>> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> As I pointed out in my earlier email, RHEL will support Python 2.6 until
>>> 2020. So I'm assuming these large companies will have the option of riding
>>> out Python 2.6 until then.
>>>
>>> Are we seriously saying that Spark should likewise support Python 2.6
>>> for the next several years? Even though the core Python devs stopped
>>> supporting it in 2013?
>>>
>>> If that's not what we're suggesting, then when, roughly, can we drop
>>> support? What are the criteria?
>>>
>>> I understand the practical concern here. If companies are stuck using
>>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>>> concern against the maintenance burden on this project, I would say that
>>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>>
>>> I suppose if our main PySpark contributors are fine putting up with
>>> those annoyances, then maybe we don't need to drop support just yet...
>>>
>>> Nick
>>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
>>> 작성:
>>>
>>>> Unfortunately, Koert is right.
>>>>
>>>> I've been in a couple of projects using Spark (banking industry) where
>>>> CentOS + Python 2.6 is the toolbox available.
>>>>
>>>> That said, I believe it should not be a concern for Spark. Python 2.6
>>>> is old and busted, which is totally opposite to the Spark philosophy IMO.
>>>>
>>>>
>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>>>>
>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>
>>>> if so, i still know plenty of large companies where python 2.6 is the
>>>> only option. asking them for python 2.7 is not going to work
>>>>
>>>> so i think its a bad idea
>>>>
>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>> juliet.hougland@gmail.com> wrote:
>>>>
>>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>>> this point, Python 3 should be the default that is encouraged.
>>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>>> the version they should theoretically use. Dropping python 2.6
>>>>> support sounds very reasonable to me.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>
>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>>>>>> we currently do).
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>>> wrote:
>>>>>>
>>>>>>> plus 1,
>>>>>>>
>>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>>>>
>>>>>>> +1
>>>>>>> We use Python 2.7
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Meethu Mathew
>>>>>>>
>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>>>> Spark 2.0?
>>>>>>>>
>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g.
>>>>>>>> json parsing) when compared with Python 2.7. Some libraries that Spark
>>>>>>>> depend on stopped supporting 2.6. We can still convince the library
>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm curious if
>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Josh Rosen <jo...@databricks.com>.
If users are able to install Spark 2.0 on their RHEL clusters, then I
imagine that they're also capable of installing a standalone Python
alongside that Spark version (without changing Python systemwide). For
instance, Anaconda/Miniconda make it really easy to install Python
2.7.x/3.x without impacting / changing the system Python and doesn't
require any special permissions to install (you don't need root / sudo
access). Does this address the Python versioning concerns for RHEL users?

On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:

> yeah, the practical concern is that we have no control over java or python
> version on large company clusters. our current reality for the vast
> majority of them is java 7 and python 2.6, no matter how outdated that is.
>
> i dont like it either, but i cannot change it.
>
> we currently don't use pyspark so i have no stake in this, but if we did i
> can assure you we would not upgrade to spark 2.x if python 2.6 was dropped.
> no point in developing something that doesnt run for majority of customers.
>
> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> As I pointed out in my earlier email, RHEL will support Python 2.6 until
>> 2020. So I'm assuming these large companies will have the option of riding
>> out Python 2.6 until then.
>>
>> Are we seriously saying that Spark should likewise support Python 2.6 for
>> the next several years? Even though the core Python devs stopped supporting
>> it in 2013?
>>
>> If that's not what we're suggesting, then when, roughly, can we drop
>> support? What are the criteria?
>>
>> I understand the practical concern here. If companies are stuck using
>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>> concern against the maintenance burden on this project, I would say that
>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>
>> I suppose if our main PySpark contributors are fine putting up with those
>> annoyances, then maybe we don't need to drop support just yet...
>>
>> Nick
>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
>> 작성:
>>
>>> Unfortunately, Koert is right.
>>>
>>> I've been in a couple of projects using Spark (banking industry) where
>>> CentOS + Python 2.6 is the toolbox available.
>>>
>>> That said, I believe it should not be a concern for Spark. Python 2.6 is
>>> old and busted, which is totally opposite to the Spark philosophy IMO.
>>>
>>>
>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>>>
>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>
>>> if so, i still know plenty of large companies where python 2.6 is the
>>> only option. asking them for python 2.7 is not going to work
>>>
>>> so i think its a bad idea
>>>
>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>> juliet.hougland@gmail.com> wrote:
>>>
>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>> this point, Python 3 should be the default that is encouraged.
>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>> the version they should theoretically use. Dropping python 2.6
>>>> support sounds very reasonable to me.
>>>>
>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>> reason to continue support for Python 2.6 IMO.
>>>>>
>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>>>>> we currently do).
>>>>>
>>>>> Nick
>>>>>
>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>> wrote:
>>>>>
>>>>>> plus 1,
>>>>>>
>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>>>
>>>>>> +1
>>>>>> We use Python 2.7
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Meethu Mathew
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>>> Spark 2.0?
>>>>>>>
>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>>>>> Python 2.6 to run Spark.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Josh Rosen <jo...@databricks.com>.
If users are able to install Spark 2.0 on their RHEL clusters, then I
imagine that they're also capable of installing a standalone Python
alongside that Spark version (without changing Python systemwide). For
instance, Anaconda/Miniconda make it really easy to install Python
2.7.x/3.x without impacting / changing the system Python and doesn't
require any special permissions to install (you don't need root / sudo
access). Does this address the Python versioning concerns for RHEL users?

On Tue, Jan 5, 2016 at 2:33 PM, Koert Kuipers <ko...@tresata.com> wrote:

> yeah, the practical concern is that we have no control over java or python
> version on large company clusters. our current reality for the vast
> majority of them is java 7 and python 2.6, no matter how outdated that is.
>
> i dont like it either, but i cannot change it.
>
> we currently don't use pyspark so i have no stake in this, but if we did i
> can assure you we would not upgrade to spark 2.x if python 2.6 was dropped.
> no point in developing something that doesnt run for majority of customers.
>
> On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> As I pointed out in my earlier email, RHEL will support Python 2.6 until
>> 2020. So I'm assuming these large companies will have the option of riding
>> out Python 2.6 until then.
>>
>> Are we seriously saying that Spark should likewise support Python 2.6 for
>> the next several years? Even though the core Python devs stopped supporting
>> it in 2013?
>>
>> If that's not what we're suggesting, then when, roughly, can we drop
>> support? What are the criteria?
>>
>> I understand the practical concern here. If companies are stuck using
>> 2.6, it doesn't matter to them that it is deprecated. But balancing that
>> concern against the maintenance burden on this project, I would say that
>> "upgrade to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to
>> take. There are many tiny annoyances one has to put up with to support 2.6.
>>
>> I suppose if our main PySpark contributors are fine putting up with those
>> annoyances, then maybe we don't need to drop support just yet...
>>
>> Nick
>> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
>> 작성:
>>
>>> Unfortunately, Koert is right.
>>>
>>> I've been in a couple of projects using Spark (banking industry) where
>>> CentOS + Python 2.6 is the toolbox available.
>>>
>>> That said, I believe it should not be a concern for Spark. Python 2.6 is
>>> old and busted, which is totally opposite to the Spark philosophy IMO.
>>>
>>>
>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>>>
>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>
>>> if so, i still know plenty of large companies where python 2.6 is the
>>> only option. asking them for python 2.7 is not going to work
>>>
>>> so i think its a bad idea
>>>
>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>> juliet.hougland@gmail.com> wrote:
>>>
>>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At
>>>> this point, Python 3 should be the default that is encouraged.
>>>> Most organizations acknowledge the 2.7 is common, but lagging behind
>>>> the version they should theoretically use. Dropping python 2.6
>>>> support sounds very reasonable to me.
>>>>
>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>>> reason to continue support for Python 2.6 IMO.
>>>>>
>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe
>>>>> we currently do).
>>>>>
>>>>> Nick
>>>>>
>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>>> wrote:
>>>>>
>>>>>> plus 1,
>>>>>>
>>>>>> we are currently using python 2.7.2 in production environment.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>>>
>>>>>> +1
>>>>>> We use Python 2.7
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Meethu Mathew
>>>>>>
>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>>> Spark 2.0?
>>>>>>>
>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>>>>> Python 2.6 to run Spark.
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
yeah, the practical concern is that we have no control over java or python
version on large company clusters. our current reality for the vast
majority of them is java 7 and python 2.6, no matter how outdated that is.

i dont like it either, but i cannot change it.

we currently don't use pyspark so i have no stake in this, but if we did i
can assure you we would not upgrade to spark 2.x if python 2.6 was dropped.
no point in developing something that doesnt run for majority of customers.

On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> As I pointed out in my earlier email, RHEL will support Python 2.6 until
> 2020. So I'm assuming these large companies will have the option of riding
> out Python 2.6 until then.
>
> Are we seriously saying that Spark should likewise support Python 2.6 for
> the next several years? Even though the core Python devs stopped supporting
> it in 2013?
>
> If that's not what we're suggesting, then when, roughly, can we drop
> support? What are the criteria?
>
> I understand the practical concern here. If companies are stuck using 2.6,
> it doesn't matter to them that it is deprecated. But balancing that concern
> against the maintenance burden on this project, I would say that "upgrade
> to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to take.
> There are many tiny annoyances one has to put up with to support 2.6.
>
> I suppose if our main PySpark contributors are fine putting up with those
> annoyances, then maybe we don't need to drop support just yet...
>
> Nick
> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
> 작성:
>
>> Unfortunately, Koert is right.
>>
>> I've been in a couple of projects using Spark (banking industry) where
>> CentOS + Python 2.6 is the toolbox available.
>>
>> That said, I believe it should not be a concern for Spark. Python 2.6 is
>> old and busted, which is totally opposite to the Spark philosophy IMO.
>>
>>
>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>>
>> rhel/centos 6 ships with python 2.6, doesnt it?
>>
>> if so, i still know plenty of large companies where python 2.6 is the
>> only option. asking them for python 2.7 is not going to work
>>
>> so i think its a bad idea
>>
>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>> juliet.hougland@gmail.com> wrote:
>>
>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
>>> point, Python 3 should be the default that is encouraged.
>>> Most organizations acknowledge the 2.7 is common, but lagging behind the
>>> version they should theoretically use. Dropping python 2.6
>>> support sounds very reasonable to me.
>>>
>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>> reason to continue support for Python 2.6 IMO.
>>>>
>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>>>> currently do).
>>>>
>>>> Nick
>>>>
>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>> wrote:
>>>>
>>>>> plus 1,
>>>>>
>>>>> we are currently using python 2.7.2 in production environment.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>>
>>>>> +1
>>>>> We use Python 2.7
>>>>>
>>>>> Regards,
>>>>>
>>>>> Meethu Mathew
>>>>>
>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>> Spark 2.0?
>>>>>>
>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>>>> Python 2.6 to run Spark.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
yeah, the practical concern is that we have no control over java or python
version on large company clusters. our current reality for the vast
majority of them is java 7 and python 2.6, no matter how outdated that is.

i dont like it either, but i cannot change it.

we currently don't use pyspark so i have no stake in this, but if we did i
can assure you we would not upgrade to spark 2.x if python 2.6 was dropped.
no point in developing something that doesnt run for majority of customers.

On Tue, Jan 5, 2016 at 5:19 PM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> As I pointed out in my earlier email, RHEL will support Python 2.6 until
> 2020. So I'm assuming these large companies will have the option of riding
> out Python 2.6 until then.
>
> Are we seriously saying that Spark should likewise support Python 2.6 for
> the next several years? Even though the core Python devs stopped supporting
> it in 2013?
>
> If that's not what we're suggesting, then when, roughly, can we drop
> support? What are the criteria?
>
> I understand the practical concern here. If companies are stuck using 2.6,
> it doesn't matter to them that it is deprecated. But balancing that concern
> against the maintenance burden on this project, I would say that "upgrade
> to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to take.
> There are many tiny annoyances one has to put up with to support 2.6.
>
> I suppose if our main PySpark contributors are fine putting up with those
> annoyances, then maybe we don't need to drop support just yet...
>
> Nick
> 2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
> 작성:
>
>> Unfortunately, Koert is right.
>>
>> I've been in a couple of projects using Spark (banking industry) where
>> CentOS + Python 2.6 is the toolbox available.
>>
>> That said, I believe it should not be a concern for Spark. Python 2.6 is
>> old and busted, which is totally opposite to the Spark philosophy IMO.
>>
>>
>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>>
>> rhel/centos 6 ships with python 2.6, doesnt it?
>>
>> if so, i still know plenty of large companies where python 2.6 is the
>> only option. asking them for python 2.7 is not going to work
>>
>> so i think its a bad idea
>>
>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>> juliet.hougland@gmail.com> wrote:
>>
>>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
>>> point, Python 3 should be the default that is encouraged.
>>> Most organizations acknowledge the 2.7 is common, but lagging behind the
>>> version they should theoretically use. Dropping python 2.6
>>> support sounds very reasonable to me.
>>>
>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> +1
>>>>
>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>>> reason to continue support for Python 2.6 IMO.
>>>>
>>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>>>> currently do).
>>>>
>>>> Nick
>>>>
>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>>> wrote:
>>>>
>>>>> plus 1,
>>>>>
>>>>> we are currently using python 2.7.2 in production environment.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>>
>>>>> +1
>>>>> We use Python 2.7
>>>>>
>>>>> Regards,
>>>>>
>>>>> Meethu Mathew
>>>>>
>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>>> Spark 2.0?
>>>>>>
>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>>>> Python 2.6 to run Spark.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>

Re: [discuss] dropping Python 2.6 support

Posted by Nicholas Chammas <ni...@gmail.com>.
As I pointed out in my earlier email, RHEL will support Python 2.6 until
2020. So I'm assuming these large companies will have the option of riding
out Python 2.6 until then.

Are we seriously saying that Spark should likewise support Python 2.6 for
the next several years? Even though the core Python devs stopped supporting
it in 2013?

If that's not what we're suggesting, then when, roughly, can we drop
support? What are the criteria?

I understand the practical concern here. If companies are stuck using 2.6,
it doesn't matter to them that it is deprecated. But balancing that concern
against the maintenance burden on this project, I would say that "upgrade
to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to take.
There are many tiny annoyances one has to put up with to support 2.6.

I suppose if our main PySpark contributors are fine putting up with those
annoyances, then maybe we don't need to drop support just yet...

Nick
2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
작성:

> Unfortunately, Koert is right.
>
> I've been in a couple of projects using Spark (banking industry) where
> CentOS + Python 2.6 is the toolbox available.
>
> That said, I believe it should not be a concern for Spark. Python 2.6 is
> old and busted, which is totally opposite to the Spark philosophy IMO.
>
>
> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>
> rhel/centos 6 ships with python 2.6, doesnt it?
>
> if so, i still know plenty of large companies where python 2.6 is the only
> option. asking them for python 2.7 is not going to work
>
> so i think its a bad idea
>
> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <juliet.hougland@gmail.com
> > wrote:
>
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
>> point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind the
>> version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.
>>
>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> +1
>>>
>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>> reason to continue support for Python 2.6 IMO.
>>>
>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>>> currently do).
>>>
>>> Nick
>>>
>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>> wrote:
>>>
>>>> plus 1,
>>>>
>>>> we are currently using python 2.7.2 in production environment.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>
>>>> +1
>>>> We use Python 2.7
>>>>
>>>> Regards,
>>>>
>>>> Meethu Mathew
>>>>
>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>>
>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>> Spark 2.0?
>>>>>
>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>>> Python 2.6 to run Spark.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Nicholas Chammas <ni...@gmail.com>.
As I pointed out in my earlier email, RHEL will support Python 2.6 until
2020. So I'm assuming these large companies will have the option of riding
out Python 2.6 until then.

Are we seriously saying that Spark should likewise support Python 2.6 for
the next several years? Even though the core Python devs stopped supporting
it in 2013?

If that's not what we're suggesting, then when, roughly, can we drop
support? What are the criteria?

I understand the practical concern here. If companies are stuck using 2.6,
it doesn't matter to them that it is deprecated. But balancing that concern
against the maintenance burden on this project, I would say that "upgrade
to Python 2.7 or stay on Spark 1.6.x" is a reasonable position to take.
There are many tiny annoyances one has to put up with to support 2.6.

I suppose if our main PySpark contributors are fine putting up with those
annoyances, then maybe we don't need to drop support just yet...

Nick
2016년 1월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <ju...@esbet.es>님이
작성:

> Unfortunately, Koert is right.
>
> I've been in a couple of projects using Spark (banking industry) where
> CentOS + Python 2.6 is the toolbox available.
>
> That said, I believe it should not be a concern for Spark. Python 2.6 is
> old and busted, which is totally opposite to the Spark philosophy IMO.
>
>
> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
>
> rhel/centos 6 ships with python 2.6, doesnt it?
>
> if so, i still know plenty of large companies where python 2.6 is the only
> option. asking them for python 2.7 is not going to work
>
> so i think its a bad idea
>
> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <juliet.hougland@gmail.com
> > wrote:
>
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
>> point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind the
>> version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.
>>
>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> +1
>>>
>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>> but otherwise yes, Python 2.6 is ancient history and the core Python
>>> developers stopped supporting it in 2013. REHL 5 is not a good enough
>>> reason to continue support for Python 2.6 IMO.
>>>
>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>>> currently do).
>>>
>>> Nick
>>>
>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com>
>>> wrote:
>>>
>>>> plus 1,
>>>>
>>>> we are currently using python 2.7.2 in production environment.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>>
>>>> +1
>>>> We use Python 2.7
>>>>
>>>> Regards,
>>>>
>>>> Meethu Mathew
>>>>
>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>>
>>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>>> Spark 2.0?
>>>>>
>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>>> Python 2.6 to run Spark.
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>>
>>>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Julio Antonio Soto de Vicente <ju...@esbet.es>.
Unfortunately, Koert is right.

I've been in a couple of projects using Spark (banking industry) where CentOS + Python 2.6 is the toolbox available. 

That said, I believe it should not be a concern for Spark. Python 2.6 is old and busted, which is totally opposite to the Spark philosophy IMO.


> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> escribió:
> 
> rhel/centos 6 ships with python 2.6, doesnt it?
> 
> if so, i still know plenty of large companies where python 2.6 is the only option. asking them for python 2.7 is not going to work
> 
> so i think its a bad idea
> 
>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <ju...@gmail.com> wrote:
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind the version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.
>> 
>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <ni...@gmail.com> wrote:
>> 
>>> +1
>>> 
>>> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python 2.6 is ancient history and the core Python developers stopped supporting it in 2013. REHL 5 is not a good enough reason to continue support for Python 2.6 IMO.
>>> 
>>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we currently do).
>>> 
>>> Nick
>>> 
>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>>>> plus 1,
>>>> 
>>>> we are currently using python 2.7.2 in production environment.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>> +1
>>>> We use Python 2.7
>>>> 
>>>> Regards,
>>>>  
>>>> Meethu Mathew
>>>> 
>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>>>>> Does anybody here care about us dropping support for Python 2.6 in Spark 2.0? 
>>>>> 
>>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json parsing) when compared with Python 2.7. Some libraries that Spark depend on stopped supporting 2.6. We can still convince the library maintainers to support 2.6, but it will be extra work. I'm curious if anybody still uses Python 2.6 to run Spark.
>>>>> 
>>>>> Thanks.
> 

Re: [discuss] dropping Python 2.6 support

Posted by Jacek Laskowski <ja...@japila.pl>.
On Sat, Jan 9, 2016 at 1:48 PM, Sean Owen <so...@cloudera.com> wrote:

> (For similar reasons I personally don't favor supporting Java 7 or
> Scala 2.10 in Spark 2.x.)

That reflects my sentiments as well. Thanks Sean for bringing that up!

Jacek

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Jacek Laskowski <ja...@japila.pl>.
On Sat, Jan 9, 2016 at 1:48 PM, Sean Owen <so...@cloudera.com> wrote:

> (For similar reasons I personally don't favor supporting Java 7 or
> Scala 2.10 in Spark 2.x.)

That reflects my sentiments as well. Thanks Sean for bringing that up!

Jacek

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Sean Owen <so...@cloudera.com>.
Chiming in late, but my take on this line of argument is: these
companies are welcome to keep using Spark 1.x. If anything the
argument here is about how long to maintain 1.x, and indeed, it's
going to go dormant quite soon.

But using RHEL 6 (or any old-er version of any platform) and not
wanting to update already means you prefer stability more than change.
I don't receive an expectation that major releases of major things
support older major releases of other things.

Conversely: supporting something in Spark 2.x means making sure
nothing breaks compatibility with it for a couple years. This is
effort than can be spent elsewhere; this has to be weighed.

(For similar reasons I personally don't favor supporting Java 7 or
Scala 2.10 in Spark 2.x.)

On Tue, Jan 5, 2016 at 7:07 PM, Koert Kuipers <ko...@tresata.com> wrote:
> rhel/centos 6 ships with python 2.6, doesnt it?
>
> if so, i still know plenty of large companies where python 2.6 is the only
> option. asking them for python 2.7 is not going to work
>
> so i think its a bad idea
>
> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <ju...@gmail.com>
> wrote:
>>
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
>> point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind the
>> version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Sean Owen <so...@cloudera.com>.
Chiming in late, but my take on this line of argument is: these
companies are welcome to keep using Spark 1.x. If anything the
argument here is about how long to maintain 1.x, and indeed, it's
going to go dormant quite soon.

But using RHEL 6 (or any old-er version of any platform) and not
wanting to update already means you prefer stability more than change.
I don't receive an expectation that major releases of major things
support older major releases of other things.

Conversely: supporting something in Spark 2.x means making sure
nothing breaks compatibility with it for a couple years. This is
effort than can be spent elsewhere; this has to be weighed.

(For similar reasons I personally don't favor supporting Java 7 or
Scala 2.10 in Spark 2.x.)

On Tue, Jan 5, 2016 at 7:07 PM, Koert Kuipers <ko...@tresata.com> wrote:
> rhel/centos 6 ships with python 2.6, doesnt it?
>
> if so, i still know plenty of large companies where python 2.6 is the only
> option. asking them for python 2.7 is not going to work
>
> so i think its a bad idea
>
> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <ju...@gmail.com>
> wrote:
>>
>> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
>> point, Python 3 should be the default that is encouraged.
>> Most organizations acknowledge the 2.7 is common, but lagging behind the
>> version they should theoretically use. Dropping python 2.6
>> support sounds very reasonable to me.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
rhel/centos 6 ships with python 2.6, doesnt it?

if so, i still know plenty of large companies where python 2.6 is the only
option. asking them for python 2.7 is not going to work

so i think its a bad idea

On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <ju...@gmail.com>
wrote:

> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
> point, Python 3 should be the default that is encouraged.
> Most organizations acknowledge the 2.7 is common, but lagging behind the
> version they should theoretically use. Dropping python 2.6
> support sounds very reasonable to me.
>
> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> +1
>>
>> Red Hat supports Python 2.6 on REHL 5 until 2020
>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>, but
>> otherwise yes, Python 2.6 is ancient history and the core Python developers
>> stopped supporting it in 2013. REHL 5 is not a good enough reason to
>> continue support for Python 2.6 IMO.
>>
>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>> currently do).
>>
>> Nick
>>
>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>>
>>> plus 1,
>>>
>>> we are currently using python 2.7.2 in production environment.
>>>
>>>
>>>
>>>
>>>
>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>
>>> +1
>>> We use Python 2.7
>>>
>>> Regards,
>>>
>>> Meethu Mathew
>>>
>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>>
>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>> Spark 2.0?
>>>>
>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>> Python 2.6 to run Spark.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Koert Kuipers <ko...@tresata.com>.
rhel/centos 6 ships with python 2.6, doesnt it?

if so, i still know plenty of large companies where python 2.6 is the only
option. asking them for python 2.7 is not going to work

so i think its a bad idea

On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <ju...@gmail.com>
wrote:

> I don't see a reason Spark 2.0 would need to support Python 2.6. At this
> point, Python 3 should be the default that is encouraged.
> Most organizations acknowledge the 2.7 is common, but lagging behind the
> version they should theoretically use. Dropping python 2.6
> support sounds very reasonable to me.
>
> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> +1
>>
>> Red Hat supports Python 2.6 on REHL 5 until 2020
>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>, but
>> otherwise yes, Python 2.6 is ancient history and the core Python developers
>> stopped supporting it in 2013. REHL 5 is not a good enough reason to
>> continue support for Python 2.6 IMO.
>>
>> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
>> currently do).
>>
>> Nick
>>
>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>>
>>> plus 1,
>>>
>>> we are currently using python 2.7.2 in production environment.
>>>
>>>
>>>
>>>
>>>
>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>>
>>> +1
>>> We use Python 2.7
>>>
>>> Regards,
>>>
>>> Meethu Mathew
>>>
>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>>
>>>> Does anybody here care about us dropping support for Python 2.6 in
>>>> Spark 2.0?
>>>>
>>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>>> stopped supporting 2.6. We can still convince the library maintainers to
>>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>>> Python 2.6 to run Spark.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Juliet Hougland <ju...@gmail.com>.
I don't see a reason Spark 2.0 would need to support Python 2.6. At this
point, Python 3 should be the default that is encouraged.
Most organizations acknowledge the 2.7 is common, but lagging behind the
version they should theoretically use. Dropping python 2.6
support sounds very reasonable to me.

On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> +1
>
> Red Hat supports Python 2.6 on REHL 5 until 2020
> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>, but
> otherwise yes, Python 2.6 is ancient history and the core Python developers
> stopped supporting it in 2013. REHL 5 is not a good enough reason to
> continue support for Python 2.6 IMO.
>
> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
> currently do).
>
> Nick
>
> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>
>> plus 1,
>>
>> we are currently using python 2.7.2 in production environment.
>>
>>
>>
>>
>>
>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>
>> +1
>> We use Python 2.7
>>
>> Regards,
>>
>> Meethu Mathew
>>
>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>>
>>

Re: [discuss] dropping Python 2.6 support

Posted by Davies Liu <da...@databricks.com>.
+1

On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas
<ni...@gmail.com> wrote:
> +1
>
> Red Hat supports Python 2.6 on REHL 5 until 2020, but otherwise yes, Python
> 2.6 is ancient history and the core Python developers stopped supporting it
> in 2013. REHL 5 is not a good enough reason to continue support for Python
> 2.6 IMO.
>
> We should aim to support Python 2.7 and Python 3.3+ (which I believe we
> currently do).
>
> Nick
>
> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:
>>
>> plus 1,
>>
>> we are currently using python 2.7.2 in production environment.
>>
>>
>>
>>
>>
>> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>>
>> +1
>> We use Python 2.7
>>
>> Regards,
>>
>> Meethu Mathew
>>
>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>>>
>>> Does anybody here care about us dropping support for Python 2.6 in Spark
>>> 2.0?
>>>
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>>> stopped supporting 2.6. We can still convince the library maintainers to
>>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>>> Python 2.6 to run Spark.
>>>
>>> Thanks.
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Nicholas Chammas <ni...@gmail.com>.
+1

Red Hat supports Python 2.6 on REHL 5 until 2020
<https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>, but
otherwise yes, Python 2.6 is ancient history and the core Python developers
stopped supporting it in 2013. REHL 5 is not a good enough reason to
continue support for Python 2.6 IMO.

We should aim to support Python 2.7 and Python 3.3+ (which I believe we
currently do).

Nick

On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:

> plus 1,
>
> we are currently using python 2.7.2 in production environment.
>
>
>
>
>
> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>
> +1
> We use Python 2.7
>
> Regards,
>
> Meethu Mathew
>
> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> Does anybody here care about us dropping support for Python 2.6 in Spark
>> 2.0?
>>
>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>> stopped supporting 2.6. We can still convince the library maintainers to
>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>> Python 2.6 to run Spark.
>>
>> Thanks.
>>
>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Nicholas Chammas <ni...@gmail.com>.
+1

Red Hat supports Python 2.6 on REHL 5 until 2020
<https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>, but
otherwise yes, Python 2.6 is ancient history and the core Python developers
stopped supporting it in 2013. REHL 5 is not a good enough reason to
continue support for Python 2.6 IMO.

We should aim to support Python 2.7 and Python 3.3+ (which I believe we
currently do).

Nick

On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <al...@126.com> wrote:

> plus 1,
>
> we are currently using python 2.7.2 in production environment.
>
>
>
>
>
> 在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:
>
> +1
> We use Python 2.7
>
> Regards,
>
> Meethu Mathew
>
> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> Does anybody here care about us dropping support for Python 2.6 in Spark
>> 2.0?
>>
>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
>> parsing) when compared with Python 2.7. Some libraries that Spark depend on
>> stopped supporting 2.6. We can still convince the library maintainers to
>> support 2.6, but it will be extra work. I'm curious if anybody still uses
>> Python 2.6 to run Spark.
>>
>> Thanks.
>>
>>
>>
>

Re: [discuss] dropping Python 2.6 support

Posted by Allen Zhang <al...@126.com>.
plus 1,


we are currently using python 2.7.2 in production environment.






在 2016-01-05 18:11:45,"Meethu Mathew" <me...@flytxt.com> 写道:

+1
We use Python 2.7


Regards,
 
Meethu Mathew


On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:

Does anybody here care about us dropping support for Python 2.6 in Spark 2.0? 


Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json parsing) when compared with Python 2.7. Some libraries that Spark depend on stopped supporting 2.6. We can still convince the library maintainers to support 2.6, but it will be extra work. I'm curious if anybody still uses Python 2.6 to run Spark.


Thanks.






Re: [discuss] dropping Python 2.6 support

Posted by Meethu Mathew <me...@flytxt.com>.
+1
We use Python 2.7

Regards,

Meethu Mathew

On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <rx...@databricks.com> wrote:

> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend on
> stopped supporting 2.6. We can still convince the library maintainers to
> support 2.6, but it will be extra work. I'm curious if anybody still uses
> Python 2.6 to run Spark.
>
> Thanks.
>
>
>

Re: [discuss] dropping Python 2.6 support

Posted by Sean Owen <so...@cloudera.com>.
+juliet for an additional opinion, but FWIW I think it's safe to say
that future CDH will have a more consistent Python story and that
story will support 2.7 rather than 2.6.

On Tue, Jan 5, 2016 at 7:17 AM, Reynold Xin <rx...@databricks.com> wrote:
> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend on
> stopped supporting 2.6. We can still convince the library maintainers to
> support 2.6, but it will be extra work. I'm curious if anybody still uses
> Python 2.6 to run Spark.
>
> Thanks.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Sasha Kacanski <sk...@gmail.com>.
+1
Companies that use stock python in redhat 2.6 will need to upgrade or
install fresh version wich is total of 3.5 minutes so no issues ...

On Tue, Jan 5, 2016 at 2:17 AM, Reynold Xin <rx...@databricks.com> wrote:

> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend on
> stopped supporting 2.6. We can still convince the library maintainers to
> support 2.6, but it will be extra work. I'm curious if anybody still uses
> Python 2.6 to run Spark.
>
> Thanks.
>
>
>


-- 
Aleksandar Kacanski

Re: [discuss] dropping Python 2.6 support

Posted by Jim Lohse <sp...@megalearningllc.com>.
Hey Python 2.6 don't let the door hit you on the way out! haha Drop It 
No Problem

On 01/05/2016 12:17 AM, Reynold Xin wrote:
> Does anybody here care about us dropping support for Python 2.6 in 
> Spark 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json 
> parsing) when compared with Python 2.7. Some libraries that Spark 
> depend on stopped supporting 2.6. We can still convince the library 
> maintainers to support 2.6, but it will be extra work. I'm curious if 
> anybody still uses Python 2.6 to run Spark.
>
> Thanks.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: [discuss] dropping Python 2.6 support

Posted by Sean Owen <so...@cloudera.com>.
+juliet for an additional opinion, but FWIW I think it's safe to say
that future CDH will have a more consistent Python story and that
story will support 2.7 rather than 2.6.

On Tue, Jan 5, 2016 at 7:17 AM, Reynold Xin <rx...@databricks.com> wrote:
> Does anybody here care about us dropping support for Python 2.6 in Spark
> 2.0?
>
> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json
> parsing) when compared with Python 2.7. Some libraries that Spark depend on
> stopped supporting 2.6. We can still convince the library maintainers to
> support 2.6, but it will be extra work. I'm curious if anybody still uses
> Python 2.6 to run Spark.
>
> Thanks.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org