You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Willi Raschkowski (Jira)" <ji...@apache.org> on 2021/11/25 17:23:00 UTC

[jira] [Updated] (SPARK-37465) PySpark tests failing on Pandas 0.23

     [ https://issues.apache.org/jira/browse/SPARK-37465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Willi Raschkowski updated SPARK-37465:
--------------------------------------
    Description: 
I was running Spark tests with Pandas {{0.23.4}} and got the error below. The minimum Pandas version is currently {{0.23.2}} [(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix (Github)|https://github.com/pandas-dev/pandas/pull/21160/files#diff-1b7183f5b3970e2a1d39a82d71686e39c765d18a34231b54c857b0c4c9bb8222] in Pandas.
{code:java}
$ python/run-tests --testnames 'pyspark.pandas.tests.data_type_ops.test_boolean_ops BooleanOpsTest.test_floordiv'

...

======================================================================
ERROR [5.785s]: test_floordiv (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py", line 128, in test_floordiv
    self.assert_eq(b_pser // b_pser.astype(int), b_psser // b_psser.astype(int))
  File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1069, in wrapper
    result = safe_na_op(lvalues, rvalues)
  File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1033, in safe_na_op
    return na_op(lvalues, rvalues)
  File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1027, in na_op
    result = missing.fill_zeros(result, x, y, op_name, fill_zeros)
  File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py", line 641, in fill_zeros
    signs = np.sign(y if name.startswith(('r', '__r')) else x)
TypeError: ufunc 'sign' did not contain a loop with signature matching types dtype('bool') dtype('bool')
{code}
These are my relevant package versions:
{code:java}
$ conda list | grep -e numpy -e pyarrow -e pandas -e python
# packages in environment at /home/circleci/miniconda/envs/python3:
numpy                     1.16.6           py36h0a8e133_3  
numpy-base                1.16.6           py36h41b4c56_3  
pandas                    0.23.4           py36h04863e7_0  
pyarrow                   1.0.1           py36h6200943_36_cpu    conda-forge
python                    3.6.12               hcff3b4d_2    anaconda
python-dateutil           2.8.1                      py_0    anaconda
python_abi                3.6                     1_cp36m    conda-forg
{code}

  was:
I was running Spark tests with Pandas {{0.23.4}} and got the error below. The minimum Pandas version is currently {{0.23.2}} [(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix (Github)|https://github.com/pandas-dev/pandas/pull/21160] in Pandas.
{code:java}
$ python/run-tests --testnames 'pyspark.pandas.tests.data_type_ops.test_boolean_ops BooleanOpsTest.test_floordiv'

...

======================================================================
ERROR [5.785s]: test_floordiv (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py", line 128, in test_floordiv
    self.assert_eq(b_pser // b_pser.astype(int), b_psser // b_psser.astype(int))
  File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1069, in wrapper
    result = safe_na_op(lvalues, rvalues)
  File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1033, in safe_na_op
    return na_op(lvalues, rvalues)
  File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1027, in na_op
    result = missing.fill_zeros(result, x, y, op_name, fill_zeros)
  File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py", line 641, in fill_zeros
    signs = np.sign(y if name.startswith(('r', '__r')) else x)
TypeError: ufunc 'sign' did not contain a loop with signature matching types dtype('bool') dtype('bool')
{code}
These are my relevant package versions:
{code:java}
$ conda list | grep -e numpy -e pyarrow -e pandas -e python
# packages in environment at /home/circleci/miniconda/envs/python3:
numpy                     1.16.6           py36h0a8e133_3  
numpy-base                1.16.6           py36h41b4c56_3  
pandas                    0.23.4           py36h04863e7_0  
pyarrow                   1.0.1           py36h6200943_36_cpu    conda-forge
python                    3.6.12               hcff3b4d_2    anaconda
python-dateutil           2.8.1                      py_0    anaconda
python_abi                3.6                     1_cp36m    conda-forg
{code}


> PySpark tests failing on Pandas 0.23
> ------------------------------------
>
>                 Key: SPARK-37465
>                 URL: https://issues.apache.org/jira/browse/SPARK-37465
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.0
>            Reporter: Willi Raschkowski
>            Priority: Major
>
> I was running Spark tests with Pandas {{0.23.4}} and got the error below. The minimum Pandas version is currently {{0.23.2}} [(Github)|https://github.com/apache/spark/blob/v3.2.0/python/setup.py#L114]. Upgrading to {{0.24.0}} fixes the error. I think Spark needs [this fix (Github)|https://github.com/pandas-dev/pandas/pull/21160/files#diff-1b7183f5b3970e2a1d39a82d71686e39c765d18a34231b54c857b0c4c9bb8222] in Pandas.
> {code:java}
> $ python/run-tests --testnames 'pyspark.pandas.tests.data_type_ops.test_boolean_ops BooleanOpsTest.test_floordiv'
> ...
> ======================================================================
> ERROR [5.785s]: test_floordiv (pyspark.pandas.tests.data_type_ops.test_boolean_ops.BooleanOpsTest)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>   File "/home/circleci/project/python/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py", line 128, in test_floordiv
>     self.assert_eq(b_pser // b_pser.astype(int), b_psser // b_psser.astype(int))
>   File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1069, in wrapper
>     result = safe_na_op(lvalues, rvalues)
>   File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1033, in safe_na_op
>     return na_op(lvalues, rvalues)
>   File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/ops.py", line 1027, in na_op
>     result = missing.fill_zeros(result, x, y, op_name, fill_zeros)
>   File "/home/circleci/miniconda/envs/python3/lib/python3.6/site-packages/pandas/core/missing.py", line 641, in fill_zeros
>     signs = np.sign(y if name.startswith(('r', '__r')) else x)
> TypeError: ufunc 'sign' did not contain a loop with signature matching types dtype('bool') dtype('bool')
> {code}
> These are my relevant package versions:
> {code:java}
> $ conda list | grep -e numpy -e pyarrow -e pandas -e python
> # packages in environment at /home/circleci/miniconda/envs/python3:
> numpy                     1.16.6           py36h0a8e133_3  
> numpy-base                1.16.6           py36h41b4c56_3  
> pandas                    0.23.4           py36h04863e7_0  
> pyarrow                   1.0.1           py36h6200943_36_cpu    conda-forge
> python                    3.6.12               hcff3b4d_2    anaconda
> python-dateutil           2.8.1                      py_0    anaconda
> python_abi                3.6                     1_cp36m    conda-forg
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org