You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/02/09 23:49:10 UTC
[spark] branch branch-3.4 updated: [SPARK-42338][CONNECT] Add details to non-fatal errors to raise a proper exception in the Python client

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new b2602c730fe [SPARK-42338][CONNECT] Add details to non-fatal errors to raise a proper exception in the Python client
b2602c730fe is described below

commit b2602c730fe063cd081077dd476431e525f3ca78
Author: Takuya UESHIN <ue...@databricks.com>
AuthorDate: Fri Feb 10 08:48:49 2023 +0900

    [SPARK-42338][CONNECT] Add details to non-fatal errors to raise a proper exception in the Python client
    
    ### What changes were proposed in this pull request?
    
    Adds details to non-fatal errors to raise a proper exception in the Python client, which makes `df.sample` raise `IllegalArgumentException` as same as PySpark, except for the timing that is delayed to call actions.
    
    ### Why are the changes needed?
    
    Currently `SparkConnectService` does not add details for `NonFatal` exceptions to the `PRCStatus`, so the Python client can't detect the exception properly and raises `SparkConnectGrpcException` instead.
    
    It also should have the details for the Python client.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Users will see a proper exception when they call `df.sample` with illegal arguments, but in a different timing.
    
    ### How was this patch tested?
    
    Enabled `DataFrameParityTests.test_sample`.
    
    Closes #39957 from ueshin/issues/SPARK-42338/sample.
    
    Authored-by: Takuya UESHIN <ue...@databricks.com>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
    (cherry picked from commit ced675071f780f75435bef5d72b115ffa783e19e)
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 .../org/apache/spark/sql/connect/service/SparkConnectService.scala | 7 +++++++
 python/pyspark/sql/tests/connect/test_parity_dataframe.py          | 3 +--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
index 683dae9cf90..25b7009860b 100644
--- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
+++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
@@ -109,6 +109,13 @@ class SparkConnectService(debug: Boolean)
       val status = RPCStatus
         .newBuilder()
         .setCode(RPCCode.INTERNAL_VALUE)
+        .addDetails(
+          ProtoAny.pack(
+            ErrorInfo
+              .newBuilder()
+              .setReason(nf.getClass.getName)
+              .setDomain("org.apache.spark")
+              .build()))
         .setMessage(nf.getLocalizedMessage)
         .build()
       observer.onError(StatusProto.toStatusRuntimeException(status))
diff --git a/python/pyspark/sql/tests/connect/test_parity_dataframe.py b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
index 7e6735cb7cd..8413dbaf06d 100644
--- a/python/pyspark/sql/tests/connect/test_parity_dataframe.py
+++ b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
@@ -85,8 +85,7 @@ class DataFrameParityTests(DataFrameTestsMixin, ReusedConnectTestCase):
     def test_same_semantics_error(self):
         super().test_same_semantics_error()
 
-    # TODO(SPARK-42338): Different exception in DataFrame.sample
-    @unittest.skip("Fails in Spark Connect, should enable.")
+    # Spark Connect throws `IllegalArgumentException` when calling `collect` instead of `sample`.
     def test_sample(self):
         super().test_sample()
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org