You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/02/02 13:58:09 UTC

[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/20487

    [SPARK-23319][TESTS] Explicitly skips PySpark tests for old Pandas and PyArrow

    ## What changes were proposed in this pull request?
    
    This PR proposes to explicitly skip the tests for old Pandas and PyArrow.
    
    We declared the extra dependencies:
    
    https://github.com/apache/spark/blob/b8bfce51abf28c66ba1fc67b0f25fe1617c81025/python/setup.py#L204
    
    but currently we only check if pyarrow is installed or not without checking the version. It already fails to run tests.
    
    Also, we have a conditional skip for old Pandas. Seems we specify the condition for Pandas >= 0.19.2.
    
    ## How was this patch tested?
    
    Manually tested by modifying the condition:
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 1.19.2 must be installed; however, your version was 0.19.2.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 1.19.2 must be installed; however, your version was 0.19.2.'
    test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 1.19.2 must be installed; however, your version was 0.19.2.'
    ```
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    ```
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was 0.8.0.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was 0.8.0.'
    test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was 0.8.0.'
    ```
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark pyarrow-pandas-skip

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20487.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20487
    
----
commit 08b42f80322636169fc440e0e2f36819b8d6e837
Author: hyukjinkwon <gu...@...>
Date:   2018-02-02T13:21:34Z

    Explicitly skips PySpark tests for old Pandas and PyArrow

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165873671
  
    --- Diff: python/setup.py ---
    @@ -100,6 +100,11 @@ def _supports_symlinks():
                   file=sys.stderr)
             exit(-1)
     
    +# If you are changing the versions here, please also change ./python/pyspark/sql/utils.py and
    +# ./python/run-tests.py. In case of Arrow, you should also check ./pom.xml.
    --- End diff --
    
    ditto of https://github.com/apache/spark/pull/20487/files#r165873632


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/528/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87134 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87134/testReport)** for PR 20487 at commit [`b7a940d`](https://github.com/apache/spark/commit/b7a940d159344d372cdeb56894e598b141a6dcff).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86997/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/589/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87066/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87057/testReport)** for PR 20487 at commit [`a0e4b16`](https://github.com/apache/spark/commit/a0e4b166f71f9bb5f3e5af7843a03c11658892fd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly specify Pandas an...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r166536018
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -646,6 +646,9 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr
             except Exception:
                 has_pandas = False
             if has_pandas and isinstance(data, pandas.DataFrame):
    +            from pyspark.sql.utils import require_minimum_pandas_version
    +            require_minimum_pandas_version()
    --- End diff --
    
    just for curious, do you have a list of the places that we do this version check for pandas and pyarrow?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87016/testReport)** for PR 20487 at commit [`6403198`](https://github.com/apache/spark/commit/640319812307b166f060366d54974c7352e3d7ba).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org