You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/30 15:41:49 UTC

[GitHub] [spark] sarutak opened a new pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

sarutak opened a new pull request #34449:
URL: https://github.com/apache/spark/pull/34449


   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This PR proposes to pin the version of PySpark to be installed in the live notebook environment.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   I noticed that the PySpark `3.1.2` is installed in the live notebook environment even though the notebook is for PySpark `3.2.0`.
   http://spark.apache.org/docs/3.2.0/api/python/getting_started/index.html
   
   I guess someone accessed to Binder and built the container image with `v3.2.0` before we published the `pyspark` package to PyPi.
   https://mybinder.org/
   
   I think it's difficult to rebuild the image manually.
   To avoid such accident, I'll propose this change.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   Confirmed that we can avoid building the container image with unexpected version of `pyspark` in Binder.
   ```
   ...
     Downloading plotly-5.3.1-py2.py3-none-any.whl (23.9 MB)
   ERROR: Could not find a version that satisfies the requirement pyspark[ml,mllib,pandas_on_spark,sql]==3.3.0.dev0 (from versions: 2.1.2, 2.1.3, 2.2.0.post0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.1, 3.1.2, 3.2.0)
   ERROR: No matching distribution found for pyspark[ml,mllib,pandas_on_spark,sql]==3.3.0.dev0
   Removing intermediate container 39ed900c3890
   The command '/bin/sh -c ./binder/postBuild' returned a non-zero code: 1Built image, launching...
   Failed to connect to event stream
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
sarutak commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955352498


   cc: @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon edited a comment on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
HyukjinKwon edited a comment on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955835593


   Merged to master and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955701953


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144789/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34449:
URL: https://github.com/apache/spark/pull/34449


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739745124



##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"
+pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]==$VERSION"

Review comment:
       Oh okay got it now. You mean we should merge this, rebuild and revert this change to retrigger the binder image build?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955701604


   **[Test build #144789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144789/testReport)** for PR 34449 at commit [`341a937`](https://github.com/apache/spark/commit/341a937c11094d22a442461a1e336910f8de5d33).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955579323


   **[Test build #144785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144785/testReport)** for PR 34449 at commit [`d3328a7`](https://github.com/apache/spark/commit/d3328a7ce41cf9268f5621b11e0fce8a783d858e).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
sarutak commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739780580



##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"

Review comment:
       Actually, I have already tried to define `binder/start` and run `pip install` in it to ensure that the expected version of `pyspark` package is installed. But downloading + installing `pyspark` package sometimes seems to take over 30 seconds so launching the live notebook will fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955418554


   **[Test build #144785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144785/testReport)** for PR 34449 at commit [`d3328a7`](https://github.com/apache/spark/commit/d3328a7ce41cf9268f5621b11e0fce8a783d858e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739744975



##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"
+pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]==$VERSION"

Review comment:
       I think we should instead do something like force reinstall




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955573009


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49254/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955492548


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49254/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955562057


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49254/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955698353


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49259/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955700365


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49259/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955682437


   **[Test build #144789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144789/testReport)** for PR 34449 at commit [`341a937`](https://github.com/apache/spark/commit/341a937c11094d22a442461a1e336910f8de5d33).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
sarutak commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739767873



##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"

Review comment:
       @HyukjinKwon Let me explain again how the problem happens, just in case. Imagine the following situation.
   
   1. Spark `X.Y.Z-rcN` which refers the commit hash `abcde` is in voting period.
   2. Someone accesses to Binder and build the container image with the commit hash `abcde` or equivalent tags (e.g. `rcN`). The image contains `pyspark` but its version is not `X.Y.Z` because `pyspark-X.Y.Z` is not published yet.
   3. `rcN` passes the vote and `pyspark-X.Y.Z` is published to PyPi. But the container image in Binder won't be rebuilt because the commit hash is not updated.
   
   As a result, the live notebook environment where we can access from the document for `X.Y.Z` doesn't contain `pyspark-X.Y.Z` even though it contains the notebooks of `X.Y.Z`.
   
   Can we prevent this issue with `git describe --tags --exact-match` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955682437


   **[Test build #144789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144789/testReport)** for PR 34449 at commit [`341a937`](https://github.com/apache/spark/commit/341a937c11094d22a442461a1e336910f8de5d33).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955836587


   Thanks @sarutak for fixing this. This live notebooks are user-facing stuff, and around the very entry point .....  we might need to roll Spark 3.2.1 soon - there seems like a bit of correctness issues too .. and a regression like SPARK-37004 ..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955835593


   Merged to master, branch-3.2 and branch-3.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955580078


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144785/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739745310



##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"

Review comment:
       @sarutak can we check it via git command and if we're on a tag, e.g.) `git describe --tags --exact-match
   `? After this change,  the manual build before the release won't be able to access to the binder ..




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739744937



##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"
+pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]==$VERSION"

Review comment:
       Oh actually it was intentional because the VERSION is like 3.3.0.dev but released PySpark is like 3.2.0




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
sarutak commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739790261



##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"

Review comment:
       Ah, O.K. I understand that if a commit is tagged, then an exactly specified version of `pyspark` should be installed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955700365


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49259/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739786825



##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"

Review comment:
       Oh yeah. your fix looks good for the tagged commits. So I wondered if we can only apply your change when a commit has a tag.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34449:
URL: https://github.com/apache/spark/pull/34449#discussion_r739744937



##########
File path: binder/postBuild
##########
@@ -21,4 +21,4 @@
 # Jupyter notebook.
 
 VERSION=$(python -c "exec(open('python/pyspark/version.py').read()); print(__version__)")
-pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]<=$VERSION"
+pip install plotly "pyspark[sql,ml,mllib,pandas_on_spark]==$VERSION"

Review comment:
       Oh actually it was intentional because the VERSION is like 3.3.0.dev but released PySpark is like 3.2.0




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955688561


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49259/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955701953


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144789/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955573009


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49254/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955418554


   **[Test build #144785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144785/testReport)** for PR 34449 at commit [`d3328a7`](https://github.com/apache/spark/commit/d3328a7ce41cf9268f5621b11e0fce8a783d858e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version for Binder

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955580078


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144785/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on pull request #34449: [SPARK-37170][PYTHON][DOCS] Pin PySpark version installed in the Binder environment for tagged commit

Posted by GitBox <gi...@apache.org>.
sarutak commented on pull request #34449:
URL: https://github.com/apache/spark/pull/34449#issuecomment-955683117


   This issue can still happen with the latest change if a container image is built between before and after a commit is tagged. But it would be a compromised solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org