You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/10/19 03:10:54 UTC

[PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

itholic opened a new pull request, #43444:
URL: https://github.com/apache/spark/pull/43444

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This PR proposes to enhanced the `assertDataFrameEqual` function to support an optional `returnUnequalRows` parameter. This parameter, will return the rows from both DataFrames that are not equal when set to `True`.
   
   ### Why are the changes needed?
   
   This enhancement provides users with an easier debugging experience by directly pointing out the rows that do not match, eliminating the need for manual comparison in case of large DataFrames.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. An optional parameter `returnUnequalRows` has been introduced in the `assertDataFrameEqual` function. When set to `True`, it will return unequal rows for further analysis. For example:
   
   ```python
   df1 = spark.createDataFrame(
       data=[("1", 1000.00), ("2", 3000.00), ("3", 2000.00)], schema=["id", "amount"])
   df2 = spark.createDataFrame(
       data=[("1", 1001.00), ("2", 3000.00), ("3", 2003.00)], schema=["id", "amount"])
   
   try:
       assertDataFrameEqual(df1, df2, returnUnequalRows=True)
   except PySparkAssertionError as e:
       spark.createDataFrame(e.data).show()
   ```
   
   The above code will produce the following DataFrame:
   ```
   +-----------+-----------+
   |         _1|         _2|
   +-----------+-----------+
   |{1, 1000.0}|{1, 1001.0}|
   |{3, 2000.0}|{3, 2003.0}|
   +-----------+-----------+
   ```
   
   ### How was this patch tested?
   
   Added usage example into doctest.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Includes a debuggable object for failed assertion [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43444:
URL: https://github.com/apache/spark/pull/43444#issuecomment-1784365081

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43444:
URL: https://github.com/apache/spark/pull/43444#issuecomment-1771872264

   let's also update PR description about the paramter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1365096108


##########
python/pyspark/testing/utils.py:
##########
@@ -424,6 +425,11 @@ def assertDataFrameEqual(
     atol : float, optional
         The absolute tolerance, used in asserting approximate equality for float values in actual
         and expected. Set to 1e-8 by default. (See Notes)
+    returnUnequalRows: bool, False

Review Comment:
   That makes sense to me. Updated the parameter name and following decriptions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1366330564


##########
python/pyspark/testing/utils.py:
##########
@@ -592,7 +616,7 @@ def assert_rows_equal(rows1: List[Row], rows2: List[Row]):
             rows_str2 += str(r2) + "\n"
             if not compare_rows(r1, r2):
                 diff_rows_cnt += 1
-                diff_rows = True
+                diff_rows.append((r1, r2))

Review Comment:
   @itholic let's optimize this. if `includeDiffRows` is disabled, we don't need to keep the diff rows which costs memory.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1366357398


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +32,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[List[Row]] = None,

Review Comment:
   BTW, just realized that `mypy` doesn't throw an error when we pass a `list` to the `Tuple` or a `tuple` to the `List`. Just FYI πŸ€”



##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +32,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[List[Row]] = None,

Review Comment:
   BTW, just realized that `mypy` doesn't throw an error when we pass a `list` to the `Tuple` or a `tuple` to the `List`. Just FYI πŸ€”



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1366349292


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +32,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[List[Row]] = None,

Review Comment:
   Is this typing correct? seems we append `diff_rows.append((r1, r2))`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1366346798


##########
python/pyspark/testing/utils.py:
##########
@@ -592,7 +616,7 @@ def assert_rows_equal(rows1: List[Row], rows2: List[Row]):
             rows_str2 += str(r2) + "\n"
             if not compare_rows(r1, r2):
                 diff_rows_cnt += 1
-                diff_rows = True
+                diff_rows.append((r1, r2))

Review Comment:
   Sounds good. Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Includes a debuggable object for failed assertion [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1374001109


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +34,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[Iterable["Row"]] = None,

Review Comment:
   Actually good point. we can add it to `PySparkAssertionError` for now, and extend it later if we want



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1366355971


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +32,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[List[Row]] = None,

Review Comment:
   On second thought, using `Iterable` seems more reasonable in this case. Let me update it again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #43444:
URL: https://github.com/apache/spark/pull/43444#issuecomment-1771894239

   Updated!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #43444:
URL: https://github.com/apache/spark/pull/43444#issuecomment-1769822729

   cc @HyukjinKwon @allanf-db 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1364863882


##########
python/pyspark/testing/utils.py:
##########
@@ -424,6 +425,11 @@ def assertDataFrameEqual(
     atol : float, optional
         The absolute tolerance, used in asserting approximate equality for float values in actual
         and expected. Set to 1e-8 by default. (See Notes)
+    returnUnequalRows: bool, False

Review Comment:
   on second thought, I think we should rename this parameter to something else like `includeDiffRows` because technically it does not return but let the exception include the different rows. cc @allanf-db 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "allanf-db (via GitHub)" <gi...@apache.org>.
allanf-db commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1366301872


##########
python/pyspark/testing/utils.py:
##########
@@ -424,6 +425,11 @@ def assertDataFrameEqual(
     atol : float, optional
         The absolute tolerance, used in asserting approximate equality for float values in actual
         and expected. Set to 1e-8 by default. (See Notes)
+    returnUnequalRows: bool, False

Review Comment:
   Makes sense to me too, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1366330932


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +31,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[Any] = None,

Review Comment:
   Can we type? e.g., tuple and/or dict



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Includes a debuggable object for failed assertion [spark]

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1373964724


##########
python/pyspark/testing/utils.py:
##########
@@ -592,20 +617,22 @@ def assert_rows_equal(rows1: List[Row], rows2: List[Row]):
             rows_str2 += str(r2) + "\n"
             if not compare_rows(r1, r2):
                 diff_rows_cnt += 1
-                diff_rows = True
+                has_diff_rows = True
+                if includeDiffRows:
+                    diff_rows.append((r1, r2))
 
         generated_diff = _context_diff(
             actual=rows_str1.splitlines(), expected=rows_str2.splitlines(), n=len(zipped)
         )
 
-        if diff_rows:
+        if has_diff_rows:
             error_msg = "Results do not match: "
             percent_diff = (diff_rows_cnt / len(zipped)) * 100
             error_msg += "( %.5f %% )" % percent_diff
             error_msg += "\n" + "\n".join(generated_diff)
+            data = diff_rows if includeDiffRows else None
             raise PySparkAssertionError(
-                error_class="DIFFERENT_ROWS",
-                message_parameters={"error_msg": error_msg},
+                error_class="DIFFERENT_ROWS", message_parameters={"error_msg": error_msg}, data=data

Review Comment:
   Can we add some tests for this new functionality in `test_utils`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Includes a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1373988507


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +34,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[Iterable["Row"]] = None,

Review Comment:
   I initially added this parameter as a member of the super class(`PySparkException`) considering future expansion. For example, we could consider the case of later extending this member in `PySparkValueError` or `PySparkTypeError` to find problematic values ​​or types.
   
   But it's true that this is currently only used for the logic related to `PySparkAssertionError`.
   
   Can I ask you opinion, @HyukjinKwon ? Should we keep this parameter here as it is, or move into `PySparkAssertionError`?



##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +34,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[Iterable["Row"]] = None,

Review Comment:
   I initially added this parameter as a member of the super class(`PySparkException`) considering future expansion. For example, we could consider the case of later extending this member in `PySparkValueError` or `PySparkTypeError` to find problematic values ​​or types.
   
   But it's true that this is currently only used for the logic related to `PySparkAssertionError`.
   
   Can I ask you opinion, @HyukjinKwon ? Should we keep this parameter here as it is, or move into `PySparkAssertionError`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "allisonwang-db (via GitHub)" <gi...@apache.org>.
allisonwang-db commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1373962947


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +34,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[Iterable["Row"]] = None,

Review Comment:
   Will this be useful for any other exceptions other than `PySparkAssertionError`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Includes a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1374018235


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +34,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[Iterable["Row"]] = None,

Review Comment:
   Updated. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Includes a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1373988727


##########
python/pyspark/testing/utils.py:
##########
@@ -592,20 +617,22 @@ def assert_rows_equal(rows1: List[Row], rows2: List[Row]):
             rows_str2 += str(r2) + "\n"
             if not compare_rows(r1, r2):
                 diff_rows_cnt += 1
-                diff_rows = True
+                has_diff_rows = True
+                if includeDiffRows:
+                    diff_rows.append((r1, r2))
 
         generated_diff = _context_diff(
             actual=rows_str1.splitlines(), expected=rows_str2.splitlines(), n=len(zipped)
         )
 
-        if diff_rows:
+        if has_diff_rows:
             error_msg = "Results do not match: "
             percent_diff = (diff_rows_cnt / len(zipped)) * 100
             error_msg += "( %.5f %% )" % percent_diff
             error_msg += "\n" + "\n".join(generated_diff)
+            data = diff_rows if includeDiffRows else None
             raise PySparkAssertionError(
-                error_class="DIFFERENT_ROWS",
-                message_parameters={"error_msg": error_msg},
+                error_class="DIFFERENT_ROWS", message_parameters={"error_msg": error_msg}, data=data

Review Comment:
   Sure! Will add it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Includes a debuggable object for failed assertion [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43444: [SPARK-45555][PYTHON] Includes a debuggable object for failed assertion
URL: https://github.com/apache/spark/pull/43444


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Includes a debuggable object for failed assertion [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43444:
URL: https://github.com/apache/spark/pull/43444#issuecomment-1800733970

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1366346557


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +31,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[Any] = None,

Review Comment:
   Sure. I initially set it as `Any` to allow taking other data types in other use cases in the future, but currently we only take list of Rows.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45555][PYTHON] Returning a debuggable object for failed assertion [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #43444:
URL: https://github.com/apache/spark/pull/43444#discussion_r1366346557


##########
python/pyspark/errors/exceptions/base.py:
##########
@@ -31,6 +31,7 @@ def __init__(
         message: Optional[str] = None,
         error_class: Optional[str] = None,
         message_parameters: Optional[Dict[str, str]] = None,
+        data: Optional[Any] = None,

Review Comment:
   Sure. I initially set it as `Any` to allow taking other data types in other use cases, but currently we only take list of Rows.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org