You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2023/02/09 03:20:43 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request, #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

HyukjinKwon opened a new pull request, #39947:
URL: https://github.com/apache/spark/pull/39947

   ### What changes were proposed in this pull request?
   
   This PR proposes to take the super classes into account when throwing an exception from the server to Python side by adding more metadata of classes, causes and traceback in JVM.
   
   In addition, this PR matches the exceptions being thrown to the regular PySpark exceptions defined:
   
   https://github.com/apache/spark/blob/04550edd49ee587656d215e59d6a072772d7d5ec/python/pyspark/errors/exceptions/captured.py#L108-L147
   
   
   ### Why are the changes needed?
   
   Right now, many exceptions cannot be handled (e.g., `NoSuchDatabaseException` that inherits `AnalysisException`) in Python side.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No to end users.
   Yes, it matches the exceptions to the regular PySpark exceptions.
   
   ### How was this patch tested?
   
   Unittests fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1424167744

   cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102122698


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,39 +106,35 @@ class SparkConnectService(debug: Boolean)
   private def handleError[V](
       opType: String,
       observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
-    case ae: AnalysisException =>
-      logError(s"Error during: $opType", ae)
-      val status = RPCStatus
-        .newBuilder()
-        .setCode(RPCCode.INTERNAL_VALUE)
-        .addDetails(
-          ProtoAny.pack(
-            ErrorInfo
-              .newBuilder()
-              .setReason(ae.getClass.getName)
-              .setDomain("org.apache.spark")
-              .putMetadata("message", ae.getSimpleMessage)
-              .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
-              .build()))
-        .setMessage(ae.getLocalizedMessage)
-        .build()
-      observer.onError(StatusProto.toStatusRuntimeException(status))
+    case se: SparkException
+        if se.getCause != null && se.getCause
+          .isInstanceOf[PythonException] && se.getCause.getStackTrace
+          .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+      // Python UDF execution

Review Comment:
   I guess we should extract this part to a method with a proper name for readability?



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,39 +106,35 @@ class SparkConnectService(debug: Boolean)
   private def handleError[V](
       opType: String,
       observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
-    case ae: AnalysisException =>
-      logError(s"Error during: $opType", ae)
-      val status = RPCStatus
-        .newBuilder()
-        .setCode(RPCCode.INTERNAL_VALUE)
-        .addDetails(
-          ProtoAny.pack(
-            ErrorInfo
-              .newBuilder()
-              .setReason(ae.getClass.getName)
-              .setDomain("org.apache.spark")
-              .putMetadata("message", ae.getSimpleMessage)
-              .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
-              .build()))
-        .setMessage(ae.getLocalizedMessage)
-        .build()
-      observer.onError(StatusProto.toStatusRuntimeException(status))
+    case se: SparkException
+        if se.getCause != null && se.getCause
+          .isInstanceOf[PythonException] && se.getCause.getStackTrace
+          .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+      // Python UDF execution
+      logError(s"Error during: $opType", se)
+      observer.onError(
+        StatusProto.toStatusRuntimeException(buildStatusFromThrowable(se.getCause)))
+
     case st: SparkThrowable =>
       logError(s"Error during: $opType", st)
-      val status = buildStatusFromThrowable(st)
-      observer.onError(StatusProto.toStatusRuntimeException(status))
+      observer.onError(StatusProto.toStatusRuntimeException(buildStatusFromThrowable(st)))
+
     case NonFatal(nf) =>
       logError(s"Error during: $opType", nf)
       val status = RPCStatus
         .newBuilder()
         .setCode(RPCCode.INTERNAL_VALUE)
-        .setMessage(nf.getLocalizedMessage)
+        .setMessage(StringUtils.abbreviate(nf.getMessage, 2048))
         .build()
       observer.onError(StatusProto.toStatusRuntimeException(status))

Review Comment:
   Need to add the details: #39957.



##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
     Failed to analyze a SQL query plan from Spark Connect server.
     """
 
-    def __init__(
-        self,
-        message: Optional[str] = None,
-        error_class: Optional[str] = None,
-        message_parameters: Optional[Dict[str, str]] = None,
-        plan: Optional[str] = None,
-        reason: Optional[str] = None,
-    ) -> None:
-        self.message = message  # type: ignore[assignment]
-        if plan is not None:
-            self.message = f"{self.message}\nPlan: {plan}"

Review Comment:
   What's the captured one's stack trace like?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101374445


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,36 @@ class SparkConnectService(debug: Boolean)
     extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
     with Logging {
 
-  private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
-    val t = Option(st.getCause).getOrElse(st)
+  private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+    val classes = ArrayBuffer.empty[Class[_]]
+    if (cl != null && !cl.equals(classOf[java.lang.Object])) {
+      classes.append(cl) // Includes itself.
+    }
+
+    @tailrec
+    def appendSuperClasses(clazz: Class[_]): Unit = {
+      if (clazz == null || clazz.equals(classOf[java.lang.Object])) return
+      classes.append(clazz.getSuperclass)
+      appendSuperClasses(clazz.getSuperclass)
+    }
+
+    appendSuperClasses(cl)
+    classes.toSeq
+  }
+
+  private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
     RPCStatus
       .newBuilder()
       .setCode(RPCCode.INTERNAL_VALUE)
       .addDetails(
         ProtoAny.pack(
           ErrorInfo
             .newBuilder()
-            .setReason(t.getClass.getName)
+            .setReason(st.getClass.getName)
             .setDomain("org.apache.spark")
+            .putMetadata("classes", compact(render(allClasses(st.getClass).map(_.getName))))
             .build()))
-      .setMessage(t.getLocalizedMessage)

Review Comment:
   `getLocalizedMessage` is not used in our codebase.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102174832


##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
     Failed to analyze a SQL query plan from Spark Connect server.
     """
 
-    def __init__(
-        self,
-        message: Optional[str] = None,
-        error_class: Optional[str] = None,
-        message_parameters: Optional[Dict[str, str]] = None,
-        plan: Optional[str] = None,
-        reason: Optional[str] = None,
-    ) -> None:
-        self.message = message  # type: ignore[assignment]
-        if plan is not None:
-            self.message = f"{self.message}\nPlan: {plan}"

Review Comment:
   eh, yeah. It still shows the plan. This is the part of `getMessage`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102172264


##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
     Failed to analyze a SQL query plan from Spark Connect server.
     """
 
-    def __init__(
-        self,
-        message: Optional[str] = None,
-        error_class: Optional[str] = None,
-        message_parameters: Optional[Dict[str, str]] = None,
-        plan: Optional[str] = None,
-        reason: Optional[str] = None,
-    ) -> None:
-        self.message = message  # type: ignore[assignment]
-        if plan is not None:
-            self.message = f"{self.message}\nPlan: {plan}"

Review Comment:
   In that case, should we still show the plan to be consistent?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1423774647

   cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ueshin commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102180018


##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
     Failed to analyze a SQL query plan from Spark Connect server.
     """
 
-    def __init__(
-        self,
-        message: Optional[str] = None,
-        error_class: Optional[str] = None,
-        message_parameters: Optional[Dict[str, str]] = None,
-        plan: Optional[str] = None,
-        reason: Optional[str] = None,
-    ) -> None:
-        self.message = message  # type: ignore[assignment]
-        if plan is not None:
-            self.message = f"{self.message}\nPlan: {plan}"

Review Comment:
   Ah, got it. 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102138582


##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
     Failed to analyze a SQL query plan from Spark Connect server.
     """
 
-    def __init__(
-        self,
-        message: Optional[str] = None,
-        error_class: Optional[str] = None,
-        message_parameters: Optional[Dict[str, str]] = None,
-        plan: Optional[str] = None,
-        reason: Optional[str] = None,
-    ) -> None:
-        self.message = message  # type: ignore[assignment]
-        if plan is not None:
-            self.message = f"{self.message}\nPlan: {plan}"

Review Comment:
   Captured one:
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/.../spark/python/pyspark/sql/dataframe.py", line 2987, in select
       jdf = self._jdf.select(self._jcols(*cols))
     File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
     File "/.../sparkk/python/pyspark/errors/exceptions/captured.py", line 159, in deco
       raise converted from None
   pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`id`].;
   'Project ['a]
   +- Range (0, 1, step=1, splits=Some(16))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101076905


##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
     Failed to analyze a SQL query plan from Spark Connect server.
     """
 
-    def __init__(
-        self,
-        message: Optional[str] = None,
-        error_class: Optional[str] = None,
-        message_parameters: Optional[Dict[str, str]] = None,
-        plan: Optional[str] = None,
-        reason: Optional[str] = None,
-    ) -> None:
-        self.message = message  # type: ignore[assignment]
-        if plan is not None:
-            self.message = f"{self.message}\nPlan: {plan}"

Review Comment:
   Example:
   
   ```python
   spark.range(1).select("a").show()
   ```
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 776, in show
       print(self._show_string(n, truncate, vertical))
     File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 619, in _show_string
       pdf = DataFrame.withPlan(
     File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1325, in toPandas
       return self._session.client.to_pandas(query)
     File "/.../spark/python/pyspark/sql/connect/client.py", line 449, in to_pandas
       table, metrics = self._execute_and_fetch(req)
     File "/.../spark/python/pyspark/sql/connect/client.py", line 636, in _execute_and_fetch
       self._handle_error(rpc_error)
     File "/.../spark/python/pyspark/sql/connect/client.py", line 670, in _handle_error
       raise convert_exception(info, status.message) from None
   pyspark.errors.exceptions.connect.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`id`].;
   'Project ['a]
   +- Range (0, 1, step=1, splits=Some(16))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145632881


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
     extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
     with Logging {
 
-  private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
-    val t = Option(st.getCause).getOrElse(st)
+  private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+    val classes = ArrayBuffer.empty[Class[_]]
+    if (cl != null && cl != classOf[java.lang.Object]) {
+      classes.append(cl) // Includes itself.
+    }
+
+    @tailrec
+    def appendSuperClasses(clazz: Class[_]): Unit = {
+      if (clazz == null || clazz == classOf[java.lang.Object]) return
+      classes.append(clazz.getSuperclass)
+      appendSuperClasses(clazz.getSuperclass)
+    }
+
+    appendSuperClasses(cl)
+    classes
+  }
+
+  private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
     RPCStatus
       .newBuilder()
       .setCode(RPCCode.INTERNAL_VALUE)
       .addDetails(
         ProtoAny.pack(
           ErrorInfo
             .newBuilder()
-            .setReason(t.getClass.getName)
+            .setReason(st.getClass.getName)
             .setDomain("org.apache.spark")
+            .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))

Review Comment:
   Yeah, I think we could put it in the body.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1102138305


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,27 +107,19 @@ class SparkConnectService(debug: Boolean)
   private def handleError[V](
       opType: String,
       observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
-    case ae: AnalysisException =>
-      logError(s"Error during: $opType", ae)
-      val status = RPCStatus
-        .newBuilder()
-        .setCode(RPCCode.INTERNAL_VALUE)
-        .addDetails(
-          ProtoAny.pack(
-            ErrorInfo
-              .newBuilder()
-              .setReason(ae.getClass.getName)
-              .setDomain("org.apache.spark")
-              .putMetadata("message", ae.getSimpleMessage)
-              .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
-              .build()))
-        .setMessage(ae.getLocalizedMessage)
-        .build()
-      observer.onError(StatusProto.toStatusRuntimeException(status))
+    case se: SparkException
+        if se.getCause != null && se.getCause
+          .isInstanceOf[PythonException] && se.getCause.getStackTrace
+          .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+      // Python UDF execution
+      logError(s"Error during: $opType", se)
+      observer.onError(
+        StatusProto.toStatusRuntimeException(buildStatusFromThrowable(se.getCause)))

Review Comment:
   ```python
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/.../python/pyspark/sql/dataframe.py", line 2987, in select
       jdf = self._jdf.select(self._jcols(*cols))
     File "/.../python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
     File "/...k/python/pyspark/errors/exceptions/captured.py", line 159, in deco
       raise converted from None
   pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `a` cannot be resolved. Did you mean one of the following? [`id`].;
   'Project ['a]
   +- Range (0, 1, step=1, splits=Some(16))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101386854


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,36 @@ class SparkConnectService(debug: Boolean)
     extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
     with Logging {
 
-  private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
-    val t = Option(st.getCause).getOrElse(st)
+  private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+    val classes = ArrayBuffer.empty[Class[_]]
+    if (cl != null && !cl.equals(classOf[java.lang.Object])) {
+      classes.append(cl) // Includes itself.
+    }
+
+    @tailrec
+    def appendSuperClasses(clazz: Class[_]): Unit = {
+      if (clazz == null || clazz.equals(classOf[java.lang.Object])) return
+      classes.append(clazz.getSuperclass)
+      appendSuperClasses(clazz.getSuperclass)
+    }
+
+    appendSuperClasses(cl)
+    classes.toSeq
+  }
+
+  private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
     RPCStatus
       .newBuilder()
       .setCode(RPCCode.INTERNAL_VALUE)
       .addDetails(
         ProtoAny.pack(
           ErrorInfo
             .newBuilder()
-            .setReason(t.getClass.getName)
+            .setReason(st.getClass.getName)
             .setDomain("org.apache.spark")
+            .putMetadata("classes", compact(render(allClasses(st.getClass).map(_.getName))))
             .build()))
-      .setMessage(t.getLocalizedMessage)

Review Comment:
   and the doc of `setMessage` mentions that it's fine to send non-localized errors (and expect the client to localize it).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1424117888

   cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception
URL: https://github.com/apache/spark/pull/39947


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1425065686

   Merged to master and branch-3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1100994134


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
     extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
     with Logging {
 
-  private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
-    val t = Option(st.getCause).getOrElse(st)
+  private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+    val classes = ArrayBuffer.empty[Class[_]]
+    if (cl != null && cl != classOf[java.lang.Object]) {
+      classes.append(cl) // Includes itself.
+    }
+
+    @tailrec
+    def appendSuperClasses(clazz: Class[_]): Unit = {
+      if (clazz == null || clazz == classOf[java.lang.Object]) return
+      classes.append(clazz.getSuperclass)
+      appendSuperClasses(clazz.getSuperclass)
+    }
+
+    appendSuperClasses(cl)
+    classes
+  }
+
+  private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
     RPCStatus
       .newBuilder()
       .setCode(RPCCode.INTERNAL_VALUE)
       .addDetails(
         ProtoAny.pack(
           ErrorInfo
             .newBuilder()
-            .setReason(t.getClass.getName)
+            .setReason(st.getClass.getName)
             .setDomain("org.apache.spark")
+            .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))

Review Comment:
   Otherwise, it complains the header length (8KiB limit). It can be configured below via `NettyChannelBuilder.maxInboundMessageSize` but I didn't change it here, see also https://stackoverflow.com/a/686243



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101075312


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,27 +107,19 @@ class SparkConnectService(debug: Boolean)
   private def handleError[V](
       opType: String,
       observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
-    case ae: AnalysisException =>
-      logError(s"Error during: $opType", ae)
-      val status = RPCStatus
-        .newBuilder()
-        .setCode(RPCCode.INTERNAL_VALUE)
-        .addDetails(
-          ProtoAny.pack(
-            ErrorInfo
-              .newBuilder()
-              .setReason(ae.getClass.getName)
-              .setDomain("org.apache.spark")
-              .putMetadata("message", ae.getSimpleMessage)
-              .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
-              .build()))
-        .setMessage(ae.getLocalizedMessage)
-        .build()
-      observer.onError(StatusProto.toStatusRuntimeException(status))
+    case se: SparkException
+        if se.getCause != null && se.getCause
+          .isInstanceOf[PythonException] && se.getCause.getStackTrace
+          .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+      // Python UDF execution
+      logError(s"Error during: $opType", se)
+      observer.onError(
+        StatusProto.toStatusRuntimeException(buildStatusFromThrowable(se.getCause)))

Review Comment:
   Example:
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 776, in show
       print(self._show_string(n, truncate, vertical))
     File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 619, in _show_string
       pdf = DataFrame.withPlan(
     File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1325, in toPandas
       return self._session.client.to_pandas(query)
     File "/.../spark/python/pyspark/sql/connect/client.py", line 449, in to_pandas
       table, metrics = self._execute_and_fetch(req)
     File "/.../spark/python/pyspark/sql/connect/client.py", line 636, in _execute_and_fetch
       self._handle_error(rpc_error)
     File "/.../spark/python/pyspark/sql/connect/client.py", line 670, in _handle_error
       raise convert_exception(info, status.message) from None
   pyspark.errors.exceptions.connect.PythonException:
     An exception was thrown from the Python worker. Please see the stack trace below.
   Traceback (most recent call last):
     File "<stdin>", line 3, in aa
   ZeroDivisionError: division by zero
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145575350


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
     extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
     with Logging {
 
-  private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
-    val t = Option(st.getCause).getOrElse(st)
+  private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+    val classes = ArrayBuffer.empty[Class[_]]
+    if (cl != null && cl != classOf[java.lang.Object]) {
+      classes.append(cl) // Includes itself.
+    }
+
+    @tailrec
+    def appendSuperClasses(clazz: Class[_]): Unit = {
+      if (clazz == null || clazz == classOf[java.lang.Object]) return
+      classes.append(clazz.getSuperclass)
+      appendSuperClasses(clazz.getSuperclass)
+    }
+
+    appendSuperClasses(cl)
+    classes
+  }
+
+  private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
     RPCStatus
       .newBuilder()
       .setCode(RPCCode.INTERNAL_VALUE)
       .addDetails(
         ProtoAny.pack(
           ErrorInfo
             .newBuilder()
-            .setReason(t.getClass.getName)
+            .setReason(st.getClass.getName)
             .setDomain("org.apache.spark")
+            .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))

Review Comment:
   @HyukjinKwon Am I understand correct the header limit is for the HTTP header? Can we put it in the body? Is there still a ~4K limit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1424165705

   cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101075312


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -83,27 +107,19 @@ class SparkConnectService(debug: Boolean)
   private def handleError[V](
       opType: String,
       observer: StreamObserver[V]): PartialFunction[Throwable, Unit] = {
-    case ae: AnalysisException =>
-      logError(s"Error during: $opType", ae)
-      val status = RPCStatus
-        .newBuilder()
-        .setCode(RPCCode.INTERNAL_VALUE)
-        .addDetails(
-          ProtoAny.pack(
-            ErrorInfo
-              .newBuilder()
-              .setReason(ae.getClass.getName)
-              .setDomain("org.apache.spark")
-              .putMetadata("message", ae.getSimpleMessage)
-              .putMetadata("plan", Option(ae.plan).flatten.map(p => s"$p").getOrElse(""))
-              .build()))
-        .setMessage(ae.getLocalizedMessage)
-        .build()
-      observer.onError(StatusProto.toStatusRuntimeException(status))
+    case se: SparkException
+        if se.getCause != null && se.getCause
+          .isInstanceOf[PythonException] && se.getCause.getStackTrace
+          .exists(_.toString.contains("org.apache.spark.sql.execution.python")) =>
+      // Python UDF execution
+      logError(s"Error during: $opType", se)
+      observer.onError(
+        StatusProto.toStatusRuntimeException(buildStatusFromThrowable(se.getCause)))

Review Comment:
   Example:
   
   ```python
   from pyspark.sql.functions import udf
   @udf
   def aa(a):
       1/0
   
   spark.range(1).select(aa("id")).show()
   ```
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 776, in show
       print(self._show_string(n, truncate, vertical))
     File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 619, in _show_string
       pdf = DataFrame.withPlan(
     File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1325, in toPandas
       return self._session.client.to_pandas(query)
     File "/.../spark/python/pyspark/sql/connect/client.py", line 449, in to_pandas
       table, metrics = self._execute_and_fetch(req)
     File "/.../spark/python/pyspark/sql/connect/client.py", line 636, in _execute_and_fetch
       self._handle_error(rpc_error)
     File "/.../spark/python/pyspark/sql/connect/client.py", line 670, in _handle_error
       raise convert_exception(info, status.message) from None
   pyspark.errors.exceptions.connect.PythonException:
     An exception was thrown from the Python worker. Please see the stack trace below.
   Traceback (most recent call last):
     File "<stdin>", line 3, in aa
   ZeroDivisionError: division by zero
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145679097


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
     extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
     with Logging {
 
-  private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
-    val t = Option(st.getCause).getOrElse(st)
+  private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+    val classes = ArrayBuffer.empty[Class[_]]
+    if (cl != null && cl != classOf[java.lang.Object]) {
+      classes.append(cl) // Includes itself.
+    }
+
+    @tailrec
+    def appendSuperClasses(clazz: Class[_]): Unit = {
+      if (clazz == null || clazz == classOf[java.lang.Object]) return
+      classes.append(clazz.getSuperclass)
+      appendSuperClasses(clazz.getSuperclass)
+    }
+
+    appendSuperClasses(cl)
+    classes
+  }
+
+  private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
     RPCStatus
       .newBuilder()
       .setCode(RPCCode.INTERNAL_VALUE)
       .addDetails(
         ProtoAny.pack(
           ErrorInfo
             .newBuilder()
-            .setReason(t.getClass.getName)
+            .setReason(st.getClass.getName)
             .setDomain("org.apache.spark")
+            .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))

Review Comment:
   Why is this needed? We have the abbreviated message already in the body



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1146129611


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
     extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
     with Logging {
 
-  private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
-    val t = Option(st.getCause).getOrElse(st)
+  private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+    val classes = ArrayBuffer.empty[Class[_]]
+    if (cl != null && cl != classOf[java.lang.Object]) {
+      classes.append(cl) // Includes itself.
+    }
+
+    @tailrec
+    def appendSuperClasses(clazz: Class[_]): Unit = {
+      if (clazz == null || clazz == classOf[java.lang.Object]) return
+      classes.append(clazz.getSuperclass)
+      appendSuperClasses(clazz.getSuperclass)
+    }
+
+    appendSuperClasses(cl)
+    classes
+  }
+
+  private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
     RPCStatus
       .newBuilder()
       .setCode(RPCCode.INTERNAL_VALUE)
       .addDetails(
         ProtoAny.pack(
           ErrorInfo
             .newBuilder()
-            .setReason(t.getClass.getName)
+            .setReason(st.getClass.getName)
             .setDomain("org.apache.spark")
+            .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))

Review Comment:
   Eh, it was already there before in metadata to print out the stacktrace from the server to the client. The exception was thrown when the size of stacktrace was too big so I made the stractrace string truncated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1100994819


##########
python/pyspark/errors/exceptions/connect.py:
##########
@@ -61,41 +97,34 @@ class AnalysisException(SparkConnectGrpcException, BaseAnalysisException):
     Failed to analyze a SQL query plan from Spark Connect server.
     """
 
-    def __init__(
-        self,
-        message: Optional[str] = None,
-        error_class: Optional[str] = None,
-        message_parameters: Optional[Dict[str, str]] = None,
-        plan: Optional[str] = None,
-        reason: Optional[str] = None,
-    ) -> None:
-        self.message = message  # type: ignore[assignment]
-        if plan is not None:
-            self.message = f"{self.message}\nPlan: {plan}"

Review Comment:
   The original `AnalysisException.getMessage` contains the string representation of the underlying plan.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1103907875


##########
python/pyspark/sql/dataframe.py:
##########
@@ -4598,7 +4598,7 @@ def freqItems(
         Examples
         --------
         >>> df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"])
-        >>> df.freqItems(["c1", "c2"]).show()  # doctest: +SKIP
+        >>> df.freqItems(["c1", "c2"]).show()

Review Comment:
   Hi, @HyukjinKwon and @ueshin . Unfortunately, this broke Scala 2.13 CI. I made a followup.
   - https://github.com/apache/spark/pull/39983



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #39947:
URL: https://github.com/apache/spark/pull/39947#issuecomment-1424133765

   cc @xinrong-meng @grundprinzip @ueshin @zhengruifeng PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1101440253


##########
python/pyspark/sql/tests/pandas/test_pandas_udf.py:
##########
@@ -171,7 +171,7 @@ def test_stopiteration_in_udf(self):
         def foo(x):
             raise StopIteration()
 
-        exc_message = "Caught StopIteration thrown from user's code; failing the task"

Review Comment:
   The error message is too long, and it gets abbreviated in Connect case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhenlineo commented on a diff in pull request #39947: [SPARK-40453][SPARK-41715][CONNECT] Take super class into account when throwing an exception

Posted by "zhenlineo (via GitHub)" <gi...@apache.org>.
zhenlineo commented on code in PR #39947:
URL: https://github.com/apache/spark/pull/39947#discussion_r1145575350


##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala:
##########
@@ -53,19 +59,37 @@ class SparkConnectService(debug: Boolean)
     extends SparkConnectServiceGrpc.SparkConnectServiceImplBase
     with Logging {
 
-  private def buildStatusFromThrowable[A <: Throwable with SparkThrowable](st: A): RPCStatus = {
-    val t = Option(st.getCause).getOrElse(st)
+  private def allClasses(cl: Class[_]): Seq[Class[_]] = {
+    val classes = ArrayBuffer.empty[Class[_]]
+    if (cl != null && cl != classOf[java.lang.Object]) {
+      classes.append(cl) // Includes itself.
+    }
+
+    @tailrec
+    def appendSuperClasses(clazz: Class[_]): Unit = {
+      if (clazz == null || clazz == classOf[java.lang.Object]) return
+      classes.append(clazz.getSuperclass)
+      appendSuperClasses(clazz.getSuperclass)
+    }
+
+    appendSuperClasses(cl)
+    classes
+  }
+
+  private def buildStatusFromThrowable(st: Throwable): RPCStatus = {
     RPCStatus
       .newBuilder()
       .setCode(RPCCode.INTERNAL_VALUE)
       .addDetails(
         ProtoAny.pack(
           ErrorInfo
             .newBuilder()
-            .setReason(t.getClass.getName)
+            .setReason(st.getClass.getName)
             .setDomain("org.apache.spark")
+            .putMetadata("message", StringUtils.abbreviate(st.getMessage, 2048))

Review Comment:
   @HyukjinKwon Am I understand correctly the header limit is for the HTTP header? Can we put it in the body? Is there still a ~4K limit?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org