You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "allisonwang-db (via GitHub)" <gi...@apache.org> on 2023/07/31 18:01:03 UTC

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42196: [SPARK-44218] Customize diff log in assertDataFrameEqual error message format

allisonwang-db commented on code in PR #42196:
URL: https://github.com/apache/spark/pull/42196#discussion_r1279694241


##########
python/pyspark/sql/tests/test_utils.py:
##########
@@ -287,25 +302,30 @@ def test_assert_notequal_arraytype(self):
             ),
         )
 
-        expected_error_message = "Results do not match: "
-        percent_diff = (1 / 2) * 100
-        expected_error_message += "( %.5f %% )" % percent_diff
+        if isinstance(df2, DataFrame):
+            actual_str = df1._jdf.showString(2, 2, False)
+            expected_str = df2._jdf.showString(2, 2, False)
+        else:
+            # Spark Connect
+            actual_str = df1._show_string(2, 2, False)
+            expected_str = df2._show_string(2, 2, False)

Review Comment:
   Instead of using _show_string to invoke a Spark job, how about we convert the df to a pandas dataframe. Pandas should be a required dependency when spark connect is enabled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org