You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "shane knapp (JIRA)" <ji...@apache.org> on 2018/11/02 19:27:01 UTC
[jira] [Comment Edited] (SPARK-25079) [PYTHON] upgrade python 3.4
-> 3.5
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673591#comment-16673591 ]
shane knapp edited comment on SPARK-25079 at 11/2/18 7:26 PM:
--------------------------------------------------------------
i think we're ready to deploy python3.5 (which will allow us to test pyarrow 0.10.0).
i created a working python 3.5 dist on my staging worker, then set up a build that:
1) scps over a hacked python/run-tests.py that adds the python 3.5 executable to the list
2) builds spark: `./build/mvn -DskipTests -Phadoop2.7 -Pyarn -Phive -Phive-thriftserver clean package`
3) runs the python tests: `./python/run-tests`
AND VOILA! IT WORKS!
[https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/93/console]
i've already staged the python3.5 environment on all of the ubuntu workers, so here are my next steps:
1) advertise the switch-over on the dev list
2) update a bunch of stuff in the repo to point to python 3.5 [1]
3) delete the existing py3k anaconda env, then clone the 3.5 env to py3k
[~bryanc] [~srowen] [~yhuai]
[1]:
{noformat}
➜ spark git:(master) grep -rw "python3\.4" *
core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: launcher.setConf(SparkLauncher.PYSPARK_DRIVER_PYTHON, "python3.4");
core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: assertEquals("python3.4", launcher.builder.conf.get(
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: "--conf", "spark.pyspark.driver.python=python3.4",
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: conf3.get(PYSPARK_DRIVER_PYTHON.key) should be ("python3.4")
docs/rdd-programming-guide.md:$ PYSPARK_PYTHON=python3.4 bin/pyspark
python/run-tests.py: python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)]
➜ spark git:(master) grep -r "py3k" *
dev/run-tests.py: os.environ["PATH"] = "/home/anaconda/envs/py3k/bin:" + os.environ.get("PATH"){noformat}
was (Author: shaneknapp):
i think we're ready to deploy python3.5 (which will allow us to test pyarrow 0.10.0).
i created a working python 3.5 dist on my staging worker, then set up a build that:
1) scps over a hacked python/run-tests.py that adds the python 3.5 executable to the list
2) builds spark: `./build/mvn -DskipTests -Phadoop2.7 -Pyarn -Phive -Phive-thriftserver clean package`
3) runs the python tests: `./python/run-tests`
AND VOILA! IT WORKS!
[https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/93/console]
https://amplab.cs.berkeley.edu/jenkins/job/ubuntuSparkPRB/116/console
i've already staged the python3.5 environment on all of the ubuntu workers, so here are my next steps:
1) advertise the switch-over on the dev list
2) update a bunch of stuff in the repo to point to python 3.5 [1]
3) delete the existing py3k anaconda env, then clone the 3.5 env to py3k
[~bryanc] [~srowen] [~yhuai]
[1]:
{noformat}
➜ spark git:(master) grep -rw "python3\.4" *
core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: launcher.setConf(SparkLauncher.PYSPARK_DRIVER_PYTHON, "python3.4");
core/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java: assertEquals("python3.4", launcher.builder.conf.get(
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: "--conf", "spark.pyspark.driver.python=python3.4",
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: conf3.get(PYSPARK_DRIVER_PYTHON.key) should be ("python3.4")
docs/rdd-programming-guide.md:$ PYSPARK_PYTHON=python3.4 bin/pyspark
python/run-tests.py: python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)]
➜ spark git:(master) grep -r "py3k" *
dev/run-tests.py: os.environ["PATH"] = "/home/anaconda/envs/py3k/bin:" + os.environ.get("PATH"){noformat}
> [PYTHON] upgrade python 3.4 -> 3.5
> ----------------------------------
>
> Key: SPARK-25079
> URL: https://issues.apache.org/jira/browse/SPARK-25079
> Project: Spark
> Issue Type: Improvement
> Components: Build, PySpark
> Affects Versions: 2.3.1
> Reporter: shane knapp
> Assignee: shane knapp
> Priority: Major
>
> for the impending arrow upgrade (https://issues.apache.org/jira/browse/SPARK-23874) we need to bump python 3.4 -> 3.5.
> i have been testing this here: [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/|https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69]
> my methodology:
> 1) upgrade python + arrow to 3.5 and 0.10.0
> 2) run python tests
> 3) when i'm happy that Things Won't Explode Spectacularly, pause jenkins and upgrade centos workers to python3.5
> 4) simultaneously do the following:
> - create a symlink in /home/anaconda/envs/py3k/bin for python3.4 that points to python3.5 (this is currently being tested here: [https://amplab.cs.berkeley.edu/jenkins/view/RISELab%20Infra/job/ubuntuSparkPRB/69)]
> - push a change to python/run-tests.py replacing 3.4 with 3.5
> 5) once the python3.5 change to run-tests.py is merged, we will need to back-port this to all existing branches
> 6) then and only then can i remove the python3.4 -> python3.5 symlink
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org