You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2019/04/04 18:51:00 UTC

[jira] [Created] (SPARK-27387) Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests

Bryan Cutler created SPARK-27387:
------------------------------------

             Summary: Replace sqlutils assertPandasEqual with Pandas assert_frame_equal in tests
                 Key: SPARK-27387
                 URL: https://issues.apache.org/jira/browse/SPARK-27387
             Project: Spark
          Issue Type: Bug
          Components: PySpark, Tests
    Affects Versions: 2.4.1
            Reporter: Bryan Cutler


In PySpark unit tests, sqlutils ReusedSQLTestCase.assertPandasEqual is meant to check if 2 pandas.DataFrames are equal but it seems for later versions of Pandas, this can fail if the DataFrame has an array column. This method can be replaced by {{assert_frame_equal}} from pandas.util.testing.  This is what it is meant for and it will give a better assertion message as well.

The test failure I have seen is:

 {noformat}
======================================================================
ERROR: test_supported_types (pyspark.sql.tests.test_pandas_udf_grouped_map.GroupedMapPandasUDFTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/bryan/git/spark/python/pyspark/sql/tests/test_pandas_udf_grouped_map.py", line 128, in test_supported_types
    self.assertPandasEqual(expected1, result1)
  File "/home/bryan/git/spark/python/pyspark/testing/sqlutils.py", line 268, in assertPandasEqual
    self.assertTrue(expected.equals(result), msg=msg)
  File "/home/bryan/miniconda2/envs/pa012/lib/python3.6/site-packages/pandas

...
  File "pandas/_libs/lib.pyx", line 523, in pandas._libs.lib.array_equivalent_object
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
 {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org