You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/02/02 13:58:09 UTC

[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/20487

    [SPARK-23319][TESTS] Explicitly skips PySpark tests for old Pandas and PyArrow

    ## What changes were proposed in this pull request?
    
    This PR proposes to explicitly skip the tests for old Pandas and PyArrow.
    
    We declared the extra dependencies:
    
    https://github.com/apache/spark/blob/b8bfce51abf28c66ba1fc67b0f25fe1617c81025/python/setup.py#L204
    
    but currently we only check if pyarrow is installed or not without checking the version. It already fails to run tests.
    
    Also, we have a conditional skip for old Pandas. Seems we specify the condition for Pandas >= 0.19.2.
    
    ## How was this patch tested?
    
    Manually tested by modifying the condition:
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 1.19.2 must be installed; however, your version was 0.19.2.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 1.19.2 must be installed; however, your version was 0.19.2.'
    test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 1.19.2 must be installed; however, your version was 0.19.2.'
    ```
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests) ... skipped 'Pandas >= 0.19.2 must be installed; however, it was not found.'
    ```
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was 0.8.0.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was 0.8.0.'
    test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 1.8.0 must be installed; however, your version was 0.8.0.'
    ```
    
    ```
    test_createDataFrame_column_name_encoding (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests) ... skipped 'PyArrow >= 0.8.0 must be installed; however, it was not found.'
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark pyarrow-pandas-skip

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20487.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20487
    
----
commit 08b42f80322636169fc440e0e2f36819b8d6e837
Author: hyukjinkwon <gu...@...>
Date:   2018-02-02T13:21:34Z

    Explicitly skips PySpark tests for old Pandas and PyArrow

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165873671
  
    --- Diff: python/setup.py ---
    @@ -100,6 +100,11 @@ def _supports_symlinks():
                   file=sys.stderr)
             exit(-1)
     
    +# If you are changing the versions here, please also change ./python/pyspark/sql/utils.py and
    +# ./python/run-tests.py. In case of Arrow, you should also check ./pom.xml.
    --- End diff --
    
    ditto of https://github.com/apache/spark/pull/20487/files#r165873632


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/528/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87134 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87134/testReport)** for PR 20487 at commit [`b7a940d`](https://github.com/apache/spark/commit/b7a940d159344d372cdeb56894e598b141a6dcff).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86997/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/589/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87066/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87057/testReport)** for PR 20487 at commit [`a0e4b16`](https://github.com/apache/spark/commit/a0e4b166f71f9bb5f3e5af7843a03c11658892fd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly specify Pandas an...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r166536018
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -646,6 +646,9 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr
             except Exception:
                 has_pandas = False
             if has_pandas and isinstance(data, pandas.DataFrame):
    +            from pyspark.sql.utils import require_minimum_pandas_version
    +            require_minimum_pandas_version()
    --- End diff --
    
    just for curious, do you have a list of the places that we do this version check for pandas and pyarrow?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87016/testReport)** for PR 20487 at commit [`6403198`](https://github.com/apache/spark/commit/640319812307b166f060366d54974c7352e3d7ba).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165883676
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1923,6 +1923,9 @@ def toPandas(self):
             0    2  Alice
             1    5    Bob
             """
    +        from pyspark.sql.utils import require_minimum_pandas_version
    --- End diff --
    
    We should also add `require_minimum_pandas_version()` to line 649 in session.py?
    https://github.com/apache/spark/blob/a0e4b166f71f9bb5f3e5af7843a03c11658892fd/python/pyspark/sql/session.py#L643-L653


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87139/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165714499
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1923,6 +1923,9 @@ def toPandas(self):
             0    2  Alice
             1    5    Bob
             """
    +        from pyspark.sql.utils import require_minimum_pandas_version
    --- End diff --
    
    `toPandas` seems already failed when it includes types `TimestampType`:
    
    ```
    >>> import datetime
    >>> spark.createDataFrame([[datetime.datetime.now()]]).toPandas()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/dataframe.py", line 1978, in toPandas
        _check_series_convert_timestamps_local_tz(pdf[field.name], timezone)
      File "/.../spark/python/pyspark/sql/types.py", line 1775, in _check_series_convert_timestamps_local_tz
        return _check_series_convert_timestamps_localize(s, None, timezone)
      File "/.../spark/python/pyspark/sql/types.py", line 1750, in _check_series_convert_timestamps_localize
        require_minimum_pandas_version()
      File "/.../spark/python/pyspark/sql/utils.py", line 128, in require_minimum_pandas_version
        "your version was %s." % (minimum_pandas_version, pandas.__version__))
    ImportError: Pandas >= 0.19.2 must be installed; however, your version was 0.16.0.
    ```
    
    Since we set the supported version, I think we should better explicitly require the version. Let me know if anyone thinks differently ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165795538
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -48,19 +48,26 @@
     else:
         import unittest
     
    -_have_pandas = False
    -_have_old_pandas = False
    +_pandas_requirement_message = None
     try:
    -    import pandas
    -    try:
    -        from pyspark.sql.utils import require_minimum_pandas_version
    -        require_minimum_pandas_version()
    -        _have_pandas = True
    -    except:
    -        _have_old_pandas = True
    -except:
    -    # No Pandas, but that's okay, we'll skip those tests
    -    pass
    +    from pyspark.sql.utils import require_minimum_pandas_version
    +    require_minimum_pandas_version()
    +except ImportError as e:
    +    from pyspark.util import _exception_message
    +    # If Pandas version requirement is not satisfied, skip related tests.
    +    _pandas_requirement_message = _exception_message(e)
    +
    +_pyarrow_requirement_message = None
    +try:
    +    from pyspark.sql.utils import require_minimum_pyarrow_version
    +    require_minimum_pyarrow_version()
    +except ImportError as e:
    +    from pyspark.util import _exception_message
    +    # If Arrow version requirement is not satisfied, skip related tests.
    +    _pyarrow_requirement_message = _exception_message(e)
    +
    +_have_pandas = _pandas_requirement_message is None
    +_have_pyarrow = _pyarrow_requirement_message is None
    --- End diff --
    
    Here is the logic I used:
    
    `_pyarrow_requirement_message` contains error message for PyArrow requirement if missing or version is not matched.
    
    if `_pyarrow_requirement_message` contains the message, `_have_pyarrow` becomes `False`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #86990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86990/testReport)** for PR 20487 at commit [`08b42f8`](https://github.com/apache/spark/commit/08b42f80322636169fc440e0e2f36819b8d6e837).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/531/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/649/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/581/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #86990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86990/testReport)** for PR 20487 at commit [`08b42f8`](https://github.com/apache/spark/commit/08b42f80322636169fc440e0e2f36819b8d6e837).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87057/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    @ueshin, @icexelloss and @cloud-fan, would you mind taking a look please?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly specify Pandas an...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r166547502
  
    --- Diff: python/pyspark/sql/session.py ---
    @@ -646,6 +646,9 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr
             except Exception:
                 has_pandas = False
             if has_pandas and isinstance(data, pandas.DataFrame):
    +            from pyspark.sql.utils import require_minimum_pandas_version
    +            require_minimum_pandas_version()
    --- End diff --
    
    I don't think I exactly know all the places exactly. For now, I can think of: createDataFrame with Pandas DataFrame input, toPandas and pandas_udf for APIs, and some places in `session.py` / `types.py` for internal methods like `_check*` family or `*arrow*` or `*pandas*`.
    
    I was thinking of working on putting those into a single module (file) after 2.3.0. Will cc you and @ueshin there.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165873582
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2794,7 +2792,6 @@ def count_bucketed_cols(names, table="pyspark_bucket"):
     
         def _to_pandas(self):
             from datetime import datetime, date
    -        import numpy as np
    --- End diff --
    
    This import seems not used in this function.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Ah, nope. If PyArrow version is lower then we claim, for example, 0.7.0, seems tests go failed:
    
    ```
    ======================================================================
    ERROR: test_vectorized_udf_wrong_return_type (pyspark.sql.tests.ScalarPandasUDF)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/.../spark/python/pyspark/sql/tests.py", line 4019, in test_vectorized_udf_wrong_return_type
        f = pandas_udf(lambda x: x * 1.0, MapType(LongType(), LongType()))
      File "/.../spark/python/pyspark/sql/functions.py", line 2309, in pandas_udf
        return _create_udf(f=f, returnType=return_type, evalType=eval_type)
      File "/.../spark/python/pyspark/sql/udf.py", line 47, in _create_udf
        require_minimum_pyarrow_version()
      File "/.../spark/python/pyspark/sql/utils.py", line 132, in require_minimum_pyarrow_version
        "however, your version was %s." % pyarrow.__version__)
    ImportError: pyarrow >= 0.8.0 must be installed on calling Python process; however, your version was 0.7.0.
    
    ----------------------------------------------------------------------
    Ran 33 tests in 8.098s
    
    FAILED (errors=33)
    ```
    
    Will clarify it in PR description.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    looks like this PR doesn't skip the "old Pandas" tests, but rewrite them?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165884027
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1923,6 +1923,9 @@ def toPandas(self):
             0    2  Alice
             1    5    Bob
             """
    +        from pyspark.sql.utils import require_minimum_pandas_version
    --- End diff --
    
    Yup, let me give a shot to clean up there too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87139/testReport)** for PR 20487 at commit [`b7a940d`](https://github.com/apache/spark/commit/b7a940d159344d372cdeb56894e598b141a6dcff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165855123
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1923,6 +1923,9 @@ def toPandas(self):
             0    2  Alice
             1    5    Bob
             """
    +        from pyspark.sql.utils import require_minimum_pandas_version
    --- End diff --
    
    this is already called here though https://github.com/apache/spark/pull/20487/files#diff-6fc344560230bf0ef711bb9b5573f1faL1939
    am I missing something?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87057/testReport)** for PR 20487 at commit [`a0e4b16`](https://github.com/apache/spark/commit/a0e4b166f71f9bb5f3e5af7843a03c11658892fd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87139/testReport)** for PR 20487 at commit [`b7a940d`](https://github.com/apache/spark/commit/b7a940d159344d372cdeb56894e598b141a6dcff).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165911313
  
    --- Diff: pom.xml ---
    @@ -185,6 +185,10 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    +    <!--
    +    If you are changing Arrow version specification, please check ./python/pyspark/sql/utils.py,
    +    ./python/run-tests.py and ./python/setup.py too.
    --- End diff --
    
    yes, true - though I think just the file name is ok - they are distinct enough to find


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #86997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86997/testReport)** for PR 20487 at commit [`67efc40`](https://github.com/apache/spark/commit/67efc40bd4186786bc684f7780d3c2a3338a3fa6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #86997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86997/testReport)** for PR 20487 at commit [`67efc40`](https://github.com/apache/spark/commit/67efc40bd4186786bc684f7780d3c2a3338a3fa6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165883842
  
    --- Diff: pom.xml ---
    @@ -185,6 +185,10 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    +    <!--
    +    If you are changing Arrow version specification, please check ./python/pyspark/sql/utils.py,
    +    ./python/run-tests.py and ./python/setup.py too.
    --- End diff --
    
    Hmmmm .. I thought the proper place to upgrade the versions should be in `setup.py` and `pom.xml` so if we happen to update PyArrow (`pom.xml` / `setup.py`) or Pandas (`setup.py`), I thought we are going to take a look for either place first.
    
    To be honest, I actually don't quite like to write down specific paths in those comments because if we happen to move, we should update all the comments ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165884501
  
    --- Diff: pom.xml ---
    @@ -185,6 +185,10 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    +    <!--
    +    If you are changing Arrow version specification, please check ./python/pyspark/sql/utils.py,
    +    ./python/run-tests.py and ./python/setup.py too.
    --- End diff --
    
    I see. I agree with it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165873632
  
    --- Diff: pom.xml ---
    @@ -185,6 +185,10 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    +    <!--
    +    If you are changing Arrow version specification, please check ./python/pyspark/sql/utils.py,
    +    ./python/run-tests.py and ./python/setup.py too.
    --- End diff --
    
    `./python/run-tests.py` is not there yet. It's a part of https://github.com/apache/spark/pull/20473.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165865310
  
    --- Diff: python/pyspark/sql/utils.py ---
    @@ -115,18 +115,30 @@ def toJArray(gateway, jtype, arr):
     def require_minimum_pandas_version():
         """ Raise ImportError if minimum version of Pandas is not installed
         """
    +    minimum_pandas_version = "0.19.2"
    +
         from distutils.version import LooseVersion
    -    import pandas
    -    if LooseVersion(pandas.__version__) < LooseVersion('0.19.2'):
    -        raise ImportError("Pandas >= 0.19.2 must be installed on calling Python process; "
    -                          "however, your version was %s." % pandas.__version__)
    +    try:
    +        import pandas
    +    except ImportError:
    +        raise ImportError("Pandas >= %s must be installed; however, "
    +                          "it was not found." % minimum_pandas_version)
    +    if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):
    +        raise ImportError("Pandas >= %s must be installed; however, "
    +                          "your version was %s." % (minimum_pandas_version, pandas.__version__))
     
     
     def require_minimum_pyarrow_version():
         """ Raise ImportError if minimum version of pyarrow is not installed
         """
    +    minimum_pyarrow_version = "0.8.0"
    --- End diff --
    
    Sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165795335
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2819,13 +2816,13 @@ def test_to_pandas(self):
             self.assertEquals(types[4], 'datetime64[ns]')
             self.assertEquals(types[5], 'datetime64[ns]')
     
    -    @unittest.skipIf(not _have_old_pandas, "Old Pandas not installed")
    -    def test_to_pandas_old(self):
    +    @unittest.skipIf(_have_pandas, "Required Pandas was found.")
    +    def test_to_pandas_required_pandas_not_found(self):
    --- End diff --
    
    Now, this also test when Pandas is missing too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87134 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87134/testReport)** for PR 20487 at commit [`b7a940d`](https://github.com/apache/spark/commit/b7a940d159344d372cdeb56894e598b141a6dcff).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Will fix the test soon tomorrow .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165795585
  
    --- Diff: python/pyspark/sql/utils.py ---
    @@ -115,18 +115,30 @@ def toJArray(gateway, jtype, arr):
     def require_minimum_pandas_version():
         """ Raise ImportError if minimum version of Pandas is not installed
         """
    +    minimum_pandas_version = "0.19.2"
    +
         from distutils.version import LooseVersion
    -    import pandas
    -    if LooseVersion(pandas.__version__) < LooseVersion('0.19.2'):
    -        raise ImportError("Pandas >= 0.19.2 must be installed on calling Python process; "
    -                          "however, your version was %s." % pandas.__version__)
    +    try:
    +        import pandas
    +    except ImportError:
    +        raise ImportError("Pandas >= %s must be installed; however, "
    +                          "it was not found." % minimum_pandas_version)
    --- End diff --
    
    I catch `ImportError` here just to make the error message nicer. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165855092
  
    --- Diff: python/pyspark/sql/utils.py ---
    @@ -115,18 +115,30 @@ def toJArray(gateway, jtype, arr):
     def require_minimum_pandas_version():
         """ Raise ImportError if minimum version of Pandas is not installed
         """
    +    minimum_pandas_version = "0.19.2"
    +
         from distutils.version import LooseVersion
    -    import pandas
    -    if LooseVersion(pandas.__version__) < LooseVersion('0.19.2'):
    -        raise ImportError("Pandas >= 0.19.2 must be installed on calling Python process; "
    -                          "however, your version was %s." % pandas.__version__)
    +    try:
    +        import pandas
    +    except ImportError:
    +        raise ImportError("Pandas >= %s must be installed; however, "
    +                          "it was not found." % minimum_pandas_version)
    +    if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):
    +        raise ImportError("Pandas >= %s must be installed; however, "
    +                          "your version was %s." % (minimum_pandas_version, pandas.__version__))
     
     
     def require_minimum_pyarrow_version():
         """ Raise ImportError if minimum version of pyarrow is not installed
         """
    +    minimum_pyarrow_version = "0.8.0"
    --- End diff --
    
    maybe add a comment in https://github.com/apache/spark/blob/master/pom.xml#L188
    otherwise it's hard to remember to change 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86994/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165865284
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1923,6 +1923,9 @@ def toPandas(self):
             0    2  Alice
             1    5    Bob
             """
    +        from pyspark.sql.utils import require_minimum_pandas_version
    --- End diff --
    
    Ah, that's pyarrow vs this one is pandas. Wanted to produce a proper message before `import pandas as pd` before :-).
    
    Above case (https://github.com/apache/spark/pull/20487/files#r165714499) is when Pandas is lower than 0.19.2. When Pandas is missing, it shows sth like:
    
    ```
    >>> spark.range(1).toPandas()
    ```
    
    before:
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/dataframe.py", line 1975, in toPandas
        import pandas as pd
    ImportError: No module named pandas
    ```
    
    after:
    
    ```
      File "<stdin>", line 1, in <module>
      File "/.../spark/python/pyspark/sql/dataframe.py", line 1927, in toPandas
        require_minimum_pandas_version()
      File "/.../spark/python/pyspark/sql/utils.py", line 125, in require_minimum_pandas_version
        "it was not found." % minimum_pandas_version)
    ImportError: Pandas >= 0.19.2 must be installed; however, it was not found.
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    @HyukjinKwon just to clarify, seems like these PySpark tests are already skipped when required pyarrow and pandas are not found, this PR refactors the error message to make that cleaner, is that correct?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87066/testReport)** for PR 20487 at commit [`873b4b9`](https://github.com/apache/spark/commit/873b4b96804ebc41b538a090064218141c0f2589).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87016/testReport)** for PR 20487 at commit [`6403198`](https://github.com/apache/spark/commit/640319812307b166f060366d54974c7352e3d7ba).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165795135
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2858,14 +2855,16 @@ def test_create_dataframe_from_pandas_with_timestamp(self):
             self.assertTrue(isinstance(df.schema['ts'].dataType, TimestampType))
             self.assertTrue(isinstance(df.schema['d'].dataType, DateType))
     
    -    @unittest.skipIf(not _have_old_pandas, "Old Pandas not installed")
    -    def test_create_dataframe_from_old_pandas(self):
    -        import pandas as pd
    -        from datetime import datetime
    -        pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)],
    -                            "d": [pd.Timestamp.now().date()]})
    +    @unittest.skipIf(_have_pandas, "Required Pandas was found.")
    +    def test_create_dataframe_required_pandas_not_found(self):
             with QuietTest(self.sc):
    -            with self.assertRaisesRegexp(ImportError, 'Pandas >= .* must be installed'):
    +            with self.assertRaisesRegexp(
    +                    ImportError,
    +                    '(Pandas >= .* must be installed|No module named pandas)'):
    --- End diff --
    
    If Pandas is lower then we have, it throws `Pandas >= .* must be installed`. It Pandas is not installed `import pandas as pd` in the test throws an exception, "No module named pandas".


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86990/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87134/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly specify Pandas and PyArr...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/646/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #86994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86994/testReport)** for PR 20487 at commit [`1606070`](https://github.com/apache/spark/commit/1606070aa15a91878b499585edaa366a2f455b08).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165940636
  
    --- Diff: pom.xml ---
    @@ -185,6 +185,10 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    +    <!--
    +    If you are changing Arrow version specification, please check ./python/pyspark/sql/utils.py,
    +    ./python/run-tests.py and ./python/setup.py too.
    --- End diff --
    
    Let me just keep them .. maybe I am too much caring about this but \*/\*/pom.xml, [./dev/run-tests|./python/run-tests|./python/run-tests.py] and [util.py|utils.py] might be confusing .. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/552/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly specify Pandas an...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20487


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #87066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87066/testReport)** for PR 20487 at commit [`873b4b9`](https://github.com/apache/spark/commit/873b4b96804ebc41b538a090064218141c0f2589).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/534/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tes...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20487#discussion_r165879157
  
    --- Diff: pom.xml ---
    @@ -185,6 +185,10 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    +    <!--
    +    If you are changing Arrow version specification, please check ./python/pyspark/sql/utils.py,
    +    ./python/run-tests.py and ./python/setup.py too.
    --- End diff --
    
    We should add the similar comment to each `*.py` file, not only `setup.py`, to refer one another? And also we should add for Pandas in each `*.py` file.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87016/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    Ah, yup. There are few tests for old Pandas which were tested only when Pandas version was lower, and I rewrote them to be tested when both Pandas version is lower and missing. Let me clarify the title and description.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20487: [SPARK-23319][TESTS] Explicitly skips PySpark tests for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20487
  
    **[Test build #86994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86994/testReport)** for PR 20487 at commit [`1606070`](https://github.com/apache/spark/commit/1606070aa15a91878b499585edaa366a2f455b08).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org