You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Josh Rosen <ro...@gmail.com> on 2015/04/17 01:46:26 UTC

Python 3 support for PySpark has been merged into master

Hi everyone,

We just merged Python 3 support for PySpark into Spark's master branch
(which will become Spark 1.4.0).  This means that PySpark now supports
Python 2.6+, PyPy 2.5+, and Python 3.4+.

To run with Python 3, download and build Spark from the master branch then
configure the PYSPARK_PYTHON environment variable to point to a Python 3.4
executable.  For example:

PYSPARK_PYTHON=python3.4 ./bin/pyspark


For more details on this feature, see the pull request and JIRA:

- https://github.com/apache/spark/pull/5173
- https://issues.apache.org/jira/browse/SPARK-4897

For Spark contributors, this change means that any open PySpark pull
requests are now likely to have merge conflicts.  If a pull request does
not have merge conflicts, we should still re-test it with Jenkins to check
that it still works under Python 3.  When backporting Python patches,
committers may wish to run the PySpark unit tests locally to make sure that
the change still work correctly in older branches.  I can also help with
backports / fixing conflicts.

Thanks to Davies Liu, Shane Knapp, Thom Neale, Xiangrui Meng, and everyone
else who helped with this patch.

- Josh