You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2023/02/09 03:20:43 UTC
[GitHub] [spark] HyukjinKwon opened a new pull request, #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
HyukjinKwon opened a new pull request, #39947:
URL: https://github.com/apache/spark/pull/39947
### What changes were proposed in this pull request?
This PR proposes to take the super classes into account when throwing an exception from the server to Python side by adding more metadata of classes, causes and traceback in JVM.
In addition, this PR matches the exceptions being thrown to the regular PySpark exceptions defined:
https://github.com/apache/spark/blob/04550edd49ee587656d215e59d6a072772d7d5ec/python/pyspark/errors/exceptions/captured.py#L108-L147
### Why are the changes needed?
Right now, many exceptions cannot be handled (e.g., `NoSuchDatabaseException` that inherits `AnalysisException`) in Python side.
### Does this PR introduce _any_ user-facing change?
No to end users.
Yes, it matches the exceptions to the regular PySpark exceptions.
### How was this patch tested?
Unittests fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1424167744
cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102122698
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,39 +106,35 @@ class SparkConnectService(debug: Boolean)
private def handleError[V](
opType: String,
observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
- case ae: AnalysisException =>
- logError(s"Error during: $opType", ae)
- val status = RPCStatus
- .newBuilder()
- .setCode(RPCCode.INTERNAL_VALUE)
- .addDetails(
- ProtoAny.pack(
- ErrorInfo
- .newBuilder()
- .setReason(ae.getClass.getName)
- .setDomain("org.apache.spark")
- .putMetadata("message", ae.getSimpleMessage)
- .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
- .build()))
- .setMessage(ae.getLocalizedMessage)
- .build()
- observer.onError(StatusProto.toStatusRuntimeException(status))
+ case se: SparkException
+ if se.getCause != null && se.getCause
+ .isInstanceOf[PythonException] && se.getCause.getStackTrace
+ .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+ // Python UDF execution
Review Comment:
I guess we should extract this part to a method with a proper name for readability?
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,39 +106,35 @@ class SparkConnectService(debug: Boolean)
private def handleError[V](
opType: String,
observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
- case ae: AnalysisException =>
- logError(s"Error during: $opType", ae)
- val status = RPCStatus
- .newBuilder()
- .setCode(RPCCode.INTERNAL_VALUE)
- .addDetails(
- ProtoAny.pack(
- ErrorInfo
- .newBuilder()
- .setReason(ae.getClass.getName)
- .setDomain("org.apache.spark")
- .putMetadata("message", ae.getSimpleMessage)
- .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
- .build()))
- .setMessage(ae.getLocalizedMessage)
- .build()
- observer.onError(StatusProto.toStatusRuntimeException(status))
+ case se: SparkException
+ if se.getCause != null && se.getCause
+ .isInstanceOf[PythonException] && se.getCause.getStackTrace
+ .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+ // Python UDF execution
+ logError(s"Error during: $opType", se)
+ observer.onError(
+ StatusProto.toStatusRuntimeException(buildStatusFromThrowable(se.getCause)))
+
case st: SparkThrowable =>
logError(s"Error during: $opType", st)
- val status = buildStatusFromThrowable(st)
- observer.onError(StatusProto.toStatusRuntimeException(status))
+ observer.onError(StatusProto.toStatusRuntimeException(buildStatusFromThrowable(st)))
+
case NonFatal(nf) =>
logError(s"Error during: $opType", nf)
val status = RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
- .setMessage(nf.getLocalizedMessage)
+ .setMessage(StringUtils.abbreviate(nf.getMessage, 2048))
.build()
observer.onError(StatusProto.toStatusRuntimeException(status))
Review Comment:
Need to add the details: #39957.
##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
Failed to analyze a SQL query plan from Spark Connect server.
"""
- def __init__(
- self,
- message: Optional[str] = None,
- error_class: Optional[str] = None,
- message_parameters: Optional[Dict[str, str]] = None,
- plan: Optional[str] = None,
- reason: Optional[str] = None,
- ) -> None:
- self.message = message # type: ignore[assignment]
- if plan is not None:
- self.message = f"{self.message}\nPlan: {plan}"
Review Comment:
What's the captured one's stack trace like?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101374445
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,36 @@ class SparkConnectService(debug: Boolean)
extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
with Logging {
- private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
- val t = Option(st.getCause).getOrElse(st)
+ private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+ val classes = ArrayBuffer.empty[Class[_]]
+ if (cl != null && !cl.equals(classOf[java.lang.Object])) {
+ classes.append(cl) // Includes itself.
+ }
+
+ @tailrec
+ def appendSuperClasses(clazz: Class[_]): Unit = {
+ if (clazz == null || clazz.equals(classOf[java.lang.Object])) return
+ classes.append(clazz.getSuperclass)
+ appendSuperClasses(clazz.getSuperclass)
+ }
+
+ appendSuperClasses(cl)
+ classes.toSeq
+ }
+
+ private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
.addDetails(
ProtoAny.pack(
ErrorInfo
.newBuilder()
- .setReason(t.getClass.getName)
+ .setReason(st.getClass.getName)
.setDomain("org.apache.spark")
+ .putMetadata("classes", compact(render(allClasses(st.getClass).map(_.getName))))
.build()))
- .setMessage(t.getLocalizedMessage)
Review Comment:
`getLocalizedMessage` is not used in our codebase.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102174832
##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
Failed to analyze a SQL query plan from Spark Connect server.
"""
- def __init__(
- self,
- message: Optional[str] = None,
- error_class: Optional[str] = None,
- message_parameters: Optional[Dict[str, str]] = None,
- plan: Optional[str] = None,
- reason: Optional[str] = None,
- ) -> None:
- self.message = message # type: ignore[assignment]
- if plan is not None:
- self.message = f"{self.message}\nPlan: {plan}"
Review Comment:
eh, yeah. It still shows the plan. This is the part of `getMessage`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102172264
##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
Failed to analyze a SQL query plan from Spark Connect server.
"""
- def __init__(
- self,
- message: Optional[str] = None,
- error_class: Optional[str] = None,
- message_parameters: Optional[Dict[str, str]] = None,
- plan: Optional[str] = None,
- reason: Optional[str] = None,
- ) -> None:
- self.message = message # type: ignore[assignment]
- if plan is not None:
- self.message = f"{self.message}\nPlan: {plan}"
Review Comment:
In that case, should we still show the plan to be consistent?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1423774647
cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] ueshin commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102180018
##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
Failed to analyze a SQL query plan from Spark Connect server.
"""
- def __init__(
- self,
- message: Optional[str] = None,
- error_class: Optional[str] = None,
- message_parameters: Optional[Dict[str, str]] = None,
- plan: Optional[str] = None,
- reason: Optional[str] = None,
- ) -> None:
- self.message = message # type: ignore[assignment]
- if plan is not None:
- self.message = f"{self.message}\nPlan: {plan}"
Review Comment:
Ah, got it. 👍
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102138582
##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
Failed to analyze a SQL query plan from Spark Connect server.
"""
- def __init__(
- self,
- message: Optional[str] = None,
- error_class: Optional[str] = None,
- message_parameters: Optional[Dict[str, str]] = None,
- plan: Optional[str] = None,
- reason: Optional[str] = None,
- ) -> None:
- self.message = message # type: ignore[assignment]
- if plan is not None:
- self.message = f"{self.message}\nPlan: {plan}"
Review Comment:
Captured one:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/dataframe.py", line 2987, in select
jdf = self._jdf.select(self._jcols(*cols))
File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/.../sparkk/python/pyspark/errors/exceptions/captured.py", line 159, in deco
raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`id`].;
'Project ['a]
+- Range (0, 1, step=1, splits=Some(16))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101076905
##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
Failed to analyze a SQL query plan from Spark Connect server.
"""
- def __init__(
- self,
- message: Optional[str] = None,
- error_class: Optional[str] = None,
- message_parameters: Optional[Dict[str, str]] = None,
- plan: Optional[str] = None,
- reason: Optional[str] = None,
- ) -> None:
- self.message = message # type: ignore[assignment]
- if plan is not None:
- self.message = f"{self.message}\nPlan: {plan}"
Review Comment:
Example:
```python
spark.range(1).select("a").show()
```
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 776, in show
print(self._show_string(n, truncate, vertical))
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 619, in _show_string
pdf = DataFrame.withPlan(
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1325, in toPandas
return self._session.client.to_pandas(query)
File "/.../spark/python/pyspark/sql/connect/client.py", line 449, in to_pandas
table, metrics = self._execute_and_fetch(req)
File "/.../spark/python/pyspark/sql/connect/client.py", line 636, in _execute_and_fetch
self._handle_error(rpc_error)
File "/.../spark/python/pyspark/sql/connect/client.py", line 670, in _handle_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`id`].;
'Project ['a]
+- Range (0, 1, step=1, splits=Some(16))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145632881
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
with Logging {
- private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
- val t = Option(st.getCause).getOrElse(st)
+ private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+ val classes = ArrayBuffer.empty[Class[_]]
+ if (cl != null && cl != classOf[java.lang.Object]) {
+ classes.append(cl) // Includes itself.
+ }
+
+ @tailrec
+ def appendSuperClasses(clazz: Class[_]): Unit = {
+ if (clazz == null || clazz == classOf[java.lang.Object]) return
+ classes.append(clazz.getSuperclass)
+ appendSuperClasses(clazz.getSuperclass)
+ }
+
+ appendSuperClasses(cl)
+ classes
+ }
+
+ private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
.addDetails(
ProtoAny.pack(
ErrorInfo
.newBuilder()
- .setReason(t.getClass.getName)
+ .setReason(st.getClass.getName)
.setDomain("org.apache.spark")
+ .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))
Review Comment:
Yeah, I think we could put it in the body.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102138305
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,27 +107,19 @@ class SparkConnectService(debug: Boolean)
private def handleError[V](
opType: String,
observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
- case ae: AnalysisException =>
- logError(s"Error during: $opType", ae)
- val status = RPCStatus
- .newBuilder()
- .setCode(RPCCode.INTERNAL_VALUE)
- .addDetails(
- ProtoAny.pack(
- ErrorInfo
- .newBuilder()
- .setReason(ae.getClass.getName)
- .setDomain("org.apache.spark")
- .putMetadata("message", ae.getSimpleMessage)
- .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
- .build()))
- .setMessage(ae.getLocalizedMessage)
- .build()
- observer.onError(StatusProto.toStatusRuntimeException(status))
+ case se: SparkException
+ if se.getCause != null && se.getCause
+ .isInstanceOf[PythonException] && se.getCause.getStackTrace
+ .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+ // Python UDF execution
+ logError(s"Error during: $opType", se)
+ observer.onError(
+ StatusProto.toStatusRuntimeException(buildStatusFromThrowable(se.getCause)))
Review Comment:
```python
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../python/pyspark/sql/dataframe.py", line 2987, in select
jdf = self._jdf.select(self._jcols(*cols))
File "/.../python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/...k/python/pyspark/errors/exceptions/captured.py", line 159, in deco
raise converted from None
pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`id`].;
'Project ['a]
+- Range (0, 1, step=1, splits=Some(16))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101386854
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,36 @@ class SparkConnectService(debug: Boolean)
extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
with Logging {
- private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
- val t = Option(st.getCause).getOrElse(st)
+ private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+ val classes = ArrayBuffer.empty[Class[_]]
+ if (cl != null && !cl.equals(classOf[java.lang.Object])) {
+ classes.append(cl) // Includes itself.
+ }
+
+ @tailrec
+ def appendSuperClasses(clazz: Class[_]): Unit = {
+ if (clazz == null || clazz.equals(classOf[java.lang.Object])) return
+ classes.append(clazz.getSuperclass)
+ appendSuperClasses(clazz.getSuperclass)
+ }
+
+ appendSuperClasses(cl)
+ classes.toSeq
+ }
+
+ private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
.addDetails(
ProtoAny.pack(
ErrorInfo
.newBuilder()
- .setReason(t.getClass.getName)
+ .setReason(st.getClass.getName)
.setDomain("org.apache.spark")
+ .putMetadata("classes", compact(render(allClasses(st.getClass).map(_.getName))))
.build()))
- .setMessage(t.getLocalizedMessage)
Review Comment:
and the doc of `setMessage` mentions that it's fine to send non-localized errors (and expect the client to localize it).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1424117888
cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
URL: https://github.com/apache/spark/pull/39947
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1425065686
Merged to master and branch-3.4.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1100994134
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
with Logging {
- private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
- val t = Option(st.getCause).getOrElse(st)
+ private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+ val classes = ArrayBuffer.empty[Class[_]]
+ if (cl != null && cl != classOf[java.lang.Object]) {
+ classes.append(cl) // Includes itself.
+ }
+
+ @tailrec
+ def appendSuperClasses(clazz: Class[_]): Unit = {
+ if (clazz == null || clazz == classOf[java.lang.Object]) return
+ classes.append(clazz.getSuperclass)
+ appendSuperClasses(clazz.getSuperclass)
+ }
+
+ appendSuperClasses(cl)
+ classes
+ }
+
+ private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
.addDetails(
ProtoAny.pack(
ErrorInfo
.newBuilder()
- .setReason(t.getClass.getName)
+ .setReason(st.getClass.getName)
.setDomain("org.apache.spark")
+ .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))
Review Comment:
Otherwise, it complains the header length (8KiB limit). It can be configured below via `NettyChannelBuilder.maxInboundMessageSize` but I didn't change it here, see also https://stackoverflow.com/a/686243
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101075312
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,27 +107,19 @@ class SparkConnectService(debug: Boolean)
private def handleError[V](
opType: String,
observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
- case ae: AnalysisException =>
- logError(s"Error during: $opType", ae)
- val status = RPCStatus
- .newBuilder()
- .setCode(RPCCode.INTERNAL_VALUE)
- .addDetails(
- ProtoAny.pack(
- ErrorInfo
- .newBuilder()
- .setReason(ae.getClass.getName)
- .setDomain("org.apache.spark")
- .putMetadata("message", ae.getSimpleMessage)
- .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
- .build()))
- .setMessage(ae.getLocalizedMessage)
- .build()
- observer.onError(StatusProto.toStatusRuntimeException(status))
+ case se: SparkException
+ if se.getCause != null && se.getCause
+ .isInstanceOf[PythonException] && se.getCause.getStackTrace
+ .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+ // Python UDF execution
+ logError(s"Error during: $opType", se)
+ observer.onError(
+ StatusProto.toStatusRuntimeException(buildStatusFromThrowable(se.getCause)))
Review Comment:
Example:
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 776, in show
print(self._show_string(n, truncate, vertical))
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 619, in _show_string
pdf = DataFrame.withPlan(
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1325, in toPandas
return self._session.client.to_pandas(query)
File "/.../spark/python/pyspark/sql/connect/client.py", line 449, in to_pandas
table, metrics = self._execute_and_fetch(req)
File "/.../spark/python/pyspark/sql/connect/client.py", line 636, in _execute_and_fetch
self._handle_error(rpc_error)
File "/.../spark/python/pyspark/sql/connect/client.py", line 670, in _handle_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "<stdin>", line 3, in aa
ZeroDivisionError: division by zero
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhenlineo commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145575350
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
with Logging {
- private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
- val t = Option(st.getCause).getOrElse(st)
+ private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+ val classes = ArrayBuffer.empty[Class[_]]
+ if (cl != null && cl != classOf[java.lang.Object]) {
+ classes.append(cl) // Includes itself.
+ }
+
+ @tailrec
+ def appendSuperClasses(clazz: Class[_]): Unit = {
+ if (clazz == null || clazz == classOf[java.lang.Object]) return
+ classes.append(clazz.getSuperclass)
+ appendSuperClasses(clazz.getSuperclass)
+ }
+
+ appendSuperClasses(cl)
+ classes
+ }
+
+ private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
.addDetails(
ProtoAny.pack(
ErrorInfo
.newBuilder()
- .setReason(t.getClass.getName)
+ .setReason(st.getClass.getName)
.setDomain("org.apache.spark")
+ .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))
Review Comment:
@HyukjinKwon Am I understand correct the header limit is for the HTTP header? Can we put it in the body? Is there still a ~4K limit?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1424165705
cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101075312
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,27 +107,19 @@ class SparkConnectService(debug: Boolean)
private def handleError[V](
opType: String,
observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
- case ae: AnalysisException =>
- logError(s"Error during: $opType", ae)
- val status = RPCStatus
- .newBuilder()
- .setCode(RPCCode.INTERNAL_VALUE)
- .addDetails(
- ProtoAny.pack(
- ErrorInfo
- .newBuilder()
- .setReason(ae.getClass.getName)
- .setDomain("org.apache.spark")
- .putMetadata("message", ae.getSimpleMessage)
- .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
- .build()))
- .setMessage(ae.getLocalizedMessage)
- .build()
- observer.onError(StatusProto.toStatusRuntimeException(status))
+ case se: SparkException
+ if se.getCause != null && se.getCause
+ .isInstanceOf[PythonException] && se.getCause.getStackTrace
+ .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+ // Python UDF execution
+ logError(s"Error during: $opType", se)
+ observer.onError(
+ StatusProto.toStatusRuntimeException(buildStatusFromThrowable(se.getCause)))
Review Comment:
Example:
```python
from pyspark.sql.functions import udf
@udf
def aa(a):
1/0
spark.range(1).select(aa("id")).show()
```
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 776, in show
print(self._show_string(n, truncate, vertical))
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 619, in _show_string
pdf = DataFrame.withPlan(
File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1325, in toPandas
return self._session.client.to_pandas(query)
File "/.../spark/python/pyspark/sql/connect/client.py", line 449, in to_pandas
table, metrics = self._execute_and_fetch(req)
File "/.../spark/python/pyspark/sql/connect/client.py", line 636, in _execute_and_fetch
self._handle_error(rpc_error)
File "/.../spark/python/pyspark/sql/connect/client.py", line 670, in _handle_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File "<stdin>", line 3, in aa
ZeroDivisionError: division by zero
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] grundprinzip commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145679097
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
with Logging {
- private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
- val t = Option(st.getCause).getOrElse(st)
+ private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+ val classes = ArrayBuffer.empty[Class[_]]
+ if (cl != null && cl != classOf[java.lang.Object]) {
+ classes.append(cl) // Includes itself.
+ }
+
+ @tailrec
+ def appendSuperClasses(clazz: Class[_]): Unit = {
+ if (clazz == null || clazz == classOf[java.lang.Object]) return
+ classes.append(clazz.getSuperclass)
+ appendSuperClasses(clazz.getSuperclass)
+ }
+
+ appendSuperClasses(cl)
+ classes
+ }
+
+ private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
.addDetails(
ProtoAny.pack(
ErrorInfo
.newBuilder()
- .setReason(t.getClass.getName)
+ .setReason(st.getClass.getName)
.setDomain("org.apache.spark")
+ .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))
Review Comment:
Why is this needed? We have the abbreviated message already in the body
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1146129611
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
with Logging {
- private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
- val t = Option(st.getCause).getOrElse(st)
+ private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+ val classes = ArrayBuffer.empty[Class[_]]
+ if (cl != null && cl != classOf[java.lang.Object]) {
+ classes.append(cl) // Includes itself.
+ }
+
+ @tailrec
+ def appendSuperClasses(clazz: Class[_]): Unit = {
+ if (clazz == null || clazz == classOf[java.lang.Object]) return
+ classes.append(clazz.getSuperclass)
+ appendSuperClasses(clazz.getSuperclass)
+ }
+
+ appendSuperClasses(cl)
+ classes
+ }
+
+ private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
.addDetails(
ProtoAny.pack(
ErrorInfo
.newBuilder()
- .setReason(t.getClass.getName)
+ .setReason(st.getClass.getName)
.setDomain("org.apache.spark")
+ .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))
Review Comment:
Eh, it was already there before in metadata to print out the stacktrace from the server to the client. The exception was thrown when the size of stacktrace was too big so I made the stractrace string truncated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1100994819
##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
Failed to analyze a SQL query plan from Spark Connect server.
"""
- def __init__(
- self,
- message: Optional[str] = None,
- error_class: Optional[str] = None,
- message_parameters: Optional[Dict[str, str]] = None,
- plan: Optional[str] = None,
- reason: Optional[str] = None,
- ) -> None:
- self.message = message # type: ignore[assignment]
- if plan is not None:
- self.message = f"{self.message}\nPlan: {plan}"
Review Comment:
The original `AnalysisException.getMessage` contains the string representation of the underlying plan.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1103907875
##########
python/pyspark/sql/dataframe.py:
##########
@@ -4598,7 +4598,7 @@ def freqItems(
Examples
--------
>>> df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"])
- >>> df.freqItems(["c1", "c2"]).show() # doctest: +SKIP
+ >>> df.freqItems(["c1", "c2"]).show()
Review Comment:
Hi, @HyukjinKwon and @ueshin . Unfortunately, this broke Scala 2.13 CI. I made a followup.
- https://github.com/apache/spark/pull/39983
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1424133765
cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101440253
##########
python/pyspark/sql/tests/pandas/test_pandas_udf.py:
##########
@@ -171,7 +171,7 @@ def test_stopiteration_in_udf(self):
def foo(x):
raise StopIteration()
- exc_message = "Caught StopIteration thrown from user's code; failing the task"
Review Comment:
The error message is too long, and it gets abbreviated in Connect case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] zhenlineo commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145575350
##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
with Logging {
- private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
- val t = Option(st.getCause).getOrElse(st)
+ private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+ val classes = ArrayBuffer.empty[Class[_]]
+ if (cl != null && cl != classOf[java.lang.Object]) {
+ classes.append(cl) // Includes itself.
+ }
+
+ @tailrec
+ def appendSuperClasses(clazz: Class[_]): Unit = {
+ if (clazz == null || clazz == classOf[java.lang.Object]) return
+ classes.append(clazz.getSuperclass)
+ appendSuperClasses(clazz.getSuperclass)
+ }
+
+ appendSuperClasses(cl)
+ classes
+ }
+
+ private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
RPCStatus
.newBuilder()
.setCode(RPCCode.INTERNAL_VALUE)
.addDetails(
ProtoAny.pack(
ErrorInfo
.newBuilder()
- .setReason(t.getClass.getName)
+ .setReason(st.getClass.getName)
.setDomain("org.apache.spark")
+ .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))
Review Comment:
@HyukjinKwon Am I understand correctly the header limit is for the HTTP header? Can we put it in the body? Is there still a ~4K limit?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org