You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/02/09 23:49:10 UTC
[spark] branch branch-3.4 updated: [SPARK-42338][CONNECT] Add details to non-fatal errors to raise a proper exception in the Python client
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.4 by this push:
new b2602c730fe [SPARK-42338][CONNECT] Add details to non-fatal errors to raise a proper exception in the Python client
b2602c730fe is described below
commit b2602c730fe063cd081077dd476431e525f3ca78
Author: Takuya UESHIN <ue...@databricks.com>
AuthorDate: Fri Feb 10 08:48:49 2023 +0900
[SPARK-42338][CONNECT] Add details to non-fatal errors to raise a proper exception in the Python client
### What changes were proposed in this pull request?
Adds details to non-fatal errors to raise a proper exception in the Python client, which makes `df.sample` raise `IllegalArgumentException` as same as PySpark, except for the timing that is delayed to call actions.
### Why are the changes needed?
Currently `SparkConnectService` does not add details for `NonFatal` exceptions to the `PRCStatus`, so the Python client can't detect the exception properly and raises `SparkConnectGrpcException` instead.
It also should have the details for the Python client.
### Does this PR introduce _any_ user-facing change?
Users will see a proper exception when they call `df.sample` with illegal arguments, but in a different timing.
### How was this patch tested?
Enabled `DataFrameParityTests.test_sample`.
Closes #39957 from ueshin/issues/SPARK-42338/sample.
Authored-by: Takuya UESHIN <ue...@databricks.com>
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
(cherry picked from commit ced675071f780f75435bef5d72b115ffa783e19e)
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
.../org/apache/spark/sql/connect/service/SparkConnectService.scala | 7 +++++++
python/pyspark/sql/tests/connect/test_parity_dataframe.py | 3 +--
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
index 683dae9cf90..25b7009860b 100644
--- a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
+++ b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala
@@ -109,6 +109,13 @@ class SparkConnectService(debug: Boolean)
val status = RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
+ .addDetails(
+ ProtoAny.pack(
+ ErrorInfo
+ .newBuilder()
+ .setReason(nf.getClass.getName)
+ .setDomain("org.apache.spark")
+ .build()))
.setMessage(nf.getLocalizedMessage)
.build()
observer.onError(StatusProto.toStatusRuntimeException(status))
diff --git a/python/pyspark/sql/tests/connect/test_parity_dataframe.py b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
index 7e6735cb7cd..8413dbaf06d 100644
--- a/python/pyspark/sql/tests/connect/test_parity_dataframe.py
+++ b/python/pyspark/sql/tests/connect/test_parity_dataframe.py
@@ -85,8 +85,7 @@ class DataFrameParityTests(DataFrameTestsMixin, ReusedConnectTestCase):
def test_same_semantics_error(self):
super().test_same_semantics_error()
- # TODO(SPARK-42338): Different exception in DataFrame.sample
- @unittest.skip("Fails in Spark Connect, should enable.")
+ # Spark Connect throws `IllegalArgumentException` when calling `collect` instead of `sample`.
def test_sample(self):
super().test_sample()
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org