You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "shane knapp (JIRA)" <ji...@apache.org> on 2018/11/02 19:27:01 UTC

[jira] [Comment Edited] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5

    [ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673591#comment-16673591 ] 

shane knapp edited comment on SPARK-25079 at 11/2/18 7:26 PM:
--------------------------------------------------------------

i think we're ready to deploy python3.5 (which will allow us to test pyarrow 0.10.0).

i created a working python 3.5 dist on my staging worker, then set up a build that:

1) scps over a hacked python/run-tests.py that adds the python 3.5 executable to the list

2) builds spark:  `./build/mvn -DskipTests -Phadoop2.7 -Pyarn -Phive -Phive-thriftserver clean package`

3) runs the python tests:  `./python/run-tests`

AND VOILA!  IT WORKS!

[https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/93/console]

i've already staged the python3.5 environment on all of the ubuntu workers, so here are my next steps:

1) advertise the switch-over on the dev list

2) update a bunch of stuff in the repo to point to python 3.5 [1]

3) delete the existing py3k anaconda env, then clone the 3.5 env to py3k

 

[~bryanc] [~srowen] [~yhuai]

[1]:
{noformat}
➜ spark git:(master) grep -rw "python3\.4" *
core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: launcher.setConf(SparkLauncher.PYSPARK_DRIVER_PYTHON, "python3.4");
core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: assertEquals("python3.4", launcher.builder.conf.get(
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: "--conf", "spark.pyspark.driver.python=python3.4",
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: conf3.get(PYSPARK_DRIVER_PYTHON.key) should be ("python3.4")
docs/rdd-programming-guide.md:$ PYSPARK_PYTHON=python3.4 bin/pyspark
python/run-tests.py: python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)]
➜ spark git:(master) grep -r "py3k" *
dev/run-tests.py: os.environ["PATH"] = "/home/anaconda/envs/py3k/bin:" + os.environ.get("PATH"){noformat}


was (Author: shaneknapp):
i think we're ready to deploy python3.5 (which will allow us to test pyarrow 0.10.0).

i created a working python 3.5 dist on my staging worker, then set up a build that:

1) scps over a hacked python/run-tests.py that adds the python 3.5 executable to the list

2) builds spark:  `./build/mvn -DskipTests -Phadoop2.7 -Pyarn -Phive -Phive-thriftserver clean package`

3) runs the python tests:  `./python/run-tests`

AND VOILA!  IT WORKS!

[https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/93/console]

https://amplab.cs.berkeley.edu/jenkins/job/ubuntuSparkPRB/116/console

i've already staged the python3.5 environment on all of the ubuntu workers, so here are my next steps:

1) advertise the switch-over on the dev list

2) update a bunch of stuff in the repo to point to python 3.5 [1]

3) delete the existing py3k anaconda env, then clone the 3.5 env to py3k

 

[~bryanc] [~srowen] [~yhuai]

[1]:

{noformat}
➜ spark git:(master) grep -rw "python3\.4" *
core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: launcher.setConf(SparkLauncher.PYSPARK_DRIVER_PYTHON, "python3.4");
core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: assertEquals("python3.4", launcher.builder.conf.get(
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: "--conf", "spark.pyspark.driver.python=python3.4",
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: conf3.get(PYSPARK_DRIVER_PYTHON.key) should be ("python3.4")
docs/rdd-programming-guide.md:$ PYSPARK_PYTHON=python3.4 bin/pyspark
python/run-tests.py: python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)]
➜ spark git:(master) grep -r "py3k" *
dev/run-tests.py: os.environ["PATH"] = "/home/anaconda/envs/py3k/bin:" + os.environ.get("PATH"){noformat}

> [PYTHON] upgrade python 3.4 -> 3.5
> ----------------------------------
>
>                 Key: SPARK-25079
>                 URL: https://issues.apache.org/jira/browse/SPARK-25079
>             Project: Spark
>          Issue Type: Improvement
>          Components: Build, PySpark
>    Affects Versions: 2.3.1
>            Reporter: shane knapp
>            Assignee: shane knapp
>            Priority: Major
>
> for the impending arrow upgrade (https://issues.apache.org/jira/browse/SPARK-23874) we need to bump python 3.4 -> 3.5.
> i have been testing this here:  [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/|https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69]
> my methodology:
> 1) upgrade python + arrow to 3.5 and 0.10.0
> 2) run python tests
> 3) when i'm happy that Things Won't Explode Spectacularly, pause jenkins and upgrade centos workers to python3.5
> 4) simultaneously do the following: 
>   - create a symlink in /home/anaconda/envs/py3k/bin for python3.4 that points to python3.5 (this is currently being tested here:  [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69)]
>   - push a change to python/run-tests.py replacing 3.4 with 3.5
> 5) once the python3.5 change to run-tests.py is merged, we will need to back-port this to all existing branches
> 6) then and only then can i remove the python3.4 -> python3.5 symlink



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org