You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/20 10:33:43 UTC

[GitHub] [spark] itholic opened a new pull request, #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

itholic opened a new pull request, #39137:
URL: https://github.com/apache/spark/pull/39137

   ### What changes were proposed in this pull request?
   
   This PR proposes to introduce `pyspark.errors` and error classes to unifying & improving errors generated by PySpark under a single path.
   
   This PR includes the changes below:
   - Add error classes for PySpark and its sub error classes into `error-classes.json`.
   - Add `PySparkErrors` in JVM side to leverage the existing error framework
   - Add new module: `pyspark.errors`
   - Add new errors defined in `pyspark.errors.errors` that return the `PySparkException` by leveraging new error classes.
   - Migrate the error messages into error classes for `pyspark/sql/functions.py`
   - Add tests for migrated error messages.
   - Add test util `check_error` for testing errors by its error classes.
   
   This is an initial PR for introducing error framework for PySpark to facilitate the error management and provide better/consistent error messages to users.
   
   While such an active work is being done on the [SQL side to improve error messages](https://issues.apache.org/jira/browse/SPARK-37935), so far there is no work to improve error messages in PySpark.
   
   So, I'd expect to also initiate the effort on error message improvement for PySpark side from this PR.
   
   **Next up** for this PR include:
   - Migrate more Python built-in exceptions generated by driver side into PySpark-specific errors.
   - Migrate the errors generated by `Py4J` into PySpark-specific errors.
   - Migrate the errors generated by Python worker side into PySpark-specific errors.
   - Migrate more error tests into tests using `checkError`.
   - Currently all PySpark-specific errors are defined as `PySparkException` class. As the number of PySpark-specific errors increases in the future, it may be necessary to further refine the `PySparkException` into multiple categories
   - Add documentation
   
   Will add more items to [umbrella JIRA](https://issues.apache.org/jira/browse/SPARK-41597) once initial PR get approved.
   
   ### Why are the changes needed?
   
   Centralizing error messages & introducing identified error class provides the following benefits:
   - Errors are searchable via the unique class names and properly classified.
   - Reduce the cost of future maintenance for PySpark errors.
   - Provide consistent & actionable error messages to users.
   - Facilitates translating error messages into different languages.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, but only for error message. No API changes at all.
   
   For example,
   
   **Before**
   ```python
   >>> from pyspark.sql import functions as F
   >>> F.window("date", 5)
   Traceback (most recent call last):
   ...
   TypeError: windowDuration should be provided as a string
   ```
   
   **After**
   ```python
   Traceback (most recent call last):
   ...
   pyspark.errors.exceptions.PySparkException: [PYSPARK.NOT_A_STRING]  Argument 'windowDuration' should be a string, got 'int'.
   ```
   
   ### How was this patch tested?
   
   By adding unittests and manually test the static analysis from `dev/lint-python`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053896788


##########
python/pyspark/errors/exceptions.py:
##########
@@ -0,0 +1,24 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.sql.utils import CapturedException
+
+
+class PySparkException(CapturedException):

Review Comment:
   I think the inheritance hierarchy here is a bit odd since `PySparkException` isn't technically `CapturedException` (captured from Py4J)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1056126655


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1046,6 +1046,63 @@
       "Protobuf type not yet supported: <protobufType>."
     ]
   },
+  "PYSPARK" : {
+    "message" : [
+      ""
+    ],
+    "subClass" : {
+      "COLUMN_IN_LIST" : {
+        "message" : [
+          "<funcName> does not allow a column in a list"
+        ]
+      },
+      "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
+        "message" : [
+          "Function `<funcName>` should return Column, got <returnType>"
+        ]
+      },
+      "NOT_A_COLUMN" : {
+        "message" : [
+          "Argument `<argName>` should be a column, got <argType>."
+        ]
+      },
+      "NOT_A_STRING" : {
+        "message" : [
+          "Argument `<argName>` should be a string, got <argType>."
+        ]
+      },
+      "NOT_COLUMN_OR_INTEGER" : {
+        "message" : [
+          "Argument `<argName>` should be a column or integer, got <argType>."

Review Comment:
   dumb question: can a error message be parameterized? 



##########
sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala:
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.python.errors
+
+import org.apache.spark._
+
+/**
+ * Object for grouping error messages from exceptions thrown by PySpark.
+ */
+private[python] object PySparkErrors {

Review Comment:
   why we need to touch the scala side?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053168085


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1046,6 +1046,63 @@
       "Protobuf type not yet supported: <protobufType>."
     ]
   },
+  "PYSPARK" : {

Review Comment:
   Later, when the amount of error classes becomes large enough to be categorized, it can be subdivided into several error classes starting with PYSPARK.
   
   e.g. "PYSPARK_INVALID_TYPE", "PYSPARK_WRONG_NUM_ARGS" or something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053172074


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors
+
+
+def columnInListError(func_name: str) -> "PySparkException":

Review Comment:
   I follows the camelCase naming rule for error names to facilitate matching with errors defined on the JVM side (`sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1057091371


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors

Review Comment:
   Yup, that's correct.
   So I think we should design the error handling logic for the Spark Connect separately.
   That is one of plan I'm thinking about as follow-up.
   We may need to build Python-specific error class for covering this case such as Spark Connect.



##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors

Review Comment:
   Yup, that's correct.
   So I think we should design the error handling logic for the Spark Connect separately.
   That is one of plan I'm thinking about as follow-up.
   We may need to build Python-specific error framework for covering this case such as Spark Connect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053176334


##########
python/pyspark/errors/exceptions.py:
##########
@@ -0,0 +1,24 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.sql.utils import CapturedException
+
+
+class PySparkException(CapturedException):

Review Comment:
   Currently, `pyspark.sql.utils` defines some classes to handle the errors from Python worker and Py4J.
   I plan to integrate them with `pyspark.errors` package as follow-up when I work on migrating the errors generated by Python worker and Py4J.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053169754


##########
python/pyspark/errors/exceptions.py:
##########
@@ -0,0 +1,24 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.sql.utils import CapturedException
+
+
+class PySparkException(CapturedException):

Review Comment:
   As mentioned in PR description, currently all PySpark-specific errors are defined as `PySparkException` class.
   It might be necessary to refine the `PySparkException` into multiple categories as the number of PySpark-specific errors increases in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053903131


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1046,6 +1046,63 @@
       "Protobuf type not yet supported: <protobufType>."
     ]
   },
+  "PYSPARK" : {
+    "message" : [
+      ""
+    ],
+    "subClass" : {
+      "COLUMN_IN_LIST" : {
+        "message" : [
+          "<funcName> does not allow a column in a list"
+        ]
+      },
+      "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
+        "message" : [
+          "Function '<funcName>' should return Column, got <returnType>"

Review Comment:
   Thanks for the comments, @MaxGekk . Let me address it soon.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on PR #39137:
URL: https://github.com/apache/spark/pull/39137#issuecomment-1365018902

   Thanks @grundprinzip for the review.
   I agree that your comments and feel it's pretty reasonable.
   
   Actually, I once submitted a PR that implemented the framework on PySpark-side (https://github.com/apache/spark/pull/39128) that has no dependency with JVM.
   
   But I closed the previous one and re-open this PR for following reason:
   1. I worried that maybe it would not be easy to maintenance when the rules on one side (PySpark vs JVM) were arbitrarily changed in the future. So, I wanted to manage all errors in a single error class file(error-class.json) across the entire Apache Spark project to reduce the management cost.
   2. I thought I might see an advantage in that we can simply reuse the existing error class as it is without adding a new one when there is a similar error already defined on the JVM side in the future.
   3. Like the functions in `functions.py` , most of PySpark's functions leverage the JVM's logic, so it is assumed that the JVM will run at least once. So I thought that calling the error implemented in is acceptable for the expected overhead.
   
   But regardless of these reasons, I think all of your comments also are pretty reasonable.
   
   So, could you take a roughly look at the changes of the [previous PR](https://github.com/apache/spark/pull/39128) when you find some time??
   
   If the approach of the previous PR which implements separate logic on the PySpark side without relying on the JVM feels more reasonable for you, let me consider the overall design again.
   
   also cc @HyukjinKwon FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1057091371


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors

Review Comment:
   Yup, that's correct.
   So I think we should design the error handling logic for the Spark Connect separately.
   That is one of plan I'm thinking about as follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053172074


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors
+
+
+def columnInListError(func_name: str) -> "PySparkException":

Review Comment:
   ~~I follows the camelCase naming rule for error names to facilitate matching with errors defined on the JVM side (`sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala`)~~
   
   We use snake_case according to https://github.com/apache/spark/pull/39137#discussion_r1053897189.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053176334


##########
python/pyspark/errors/exceptions.py:
##########
@@ -0,0 +1,24 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.sql.utils import CapturedException
+
+
+class PySparkException(CapturedException):

Review Comment:
   Currently, `pyspark.sql.utils` defines some classes to handle the errors from Python worker of Py4J.
   I plan to integrate them with `pyspark.errors` package as follow-up when I work on migrating the errors generated by Python worker and Py4J.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053896291


##########
python/pyspark/sql/utils.py:
##########
@@ -102,6 +102,17 @@ def getSqlState(self) -> Optional[str]:
         else:
             return None
 
+    def getMessageParameters(self) -> Optional[dict]:

Review Comment:
   Should we inherit `PySparkException` here for `CapturedException`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053897189


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors
+
+
+def columnInListError(func_name: str) -> "PySparkException":
+    pyspark_errors = _get_pyspark_errors()
+    e = pyspark_errors.columnInListError(func_name)
+    return PySparkException(origin=e)
+
+
+def higherOrderFunctionShouldReturnColumnError(
+    func_name: str, return_type: Type[Any]
+) -> "PySparkException":
+    pyspark_errors = _get_pyspark_errors()
+    e = pyspark_errors.higherOrderFunctionShouldReturnColumnError(func_name, return_type.__name__)
+    return PySparkException(origin=e)
+
+
+def notColumnError(arg_name: str, arg_type: Type[Any]) -> "PySparkException":
+    pyspark_errors = _get_pyspark_errors()
+    e = pyspark_errors.notColumnError(arg_name, arg_type.__name__)
+    return PySparkException(origin=e)
+
+
+def notStringError(arg_name: str, arg_type: Type[Any]) -> "PySparkException":
+    pyspark_errors = _get_pyspark_errors()
+    e = pyspark_errors.notStringError(arg_name, arg_type.__name__)
+    return PySparkException(origin=e)
+
+
+def notColumnOrIntegerError(arg_name: str, arg_type: Type[Any]) -> "PySparkException":
+    pyspark_errors = _get_pyspark_errors()
+    e = pyspark_errors.notColumnOrIntegerError(arg_name, arg_type.__name__)
+    return PySparkException(origin=e)
+
+
+def notColumnOrIntegerOrStringError(arg_name: str, arg_type: Type[Any]) -> "PySparkException":
+    pyspark_errors = _get_pyspark_errors()
+    e = pyspark_errors.notColumnOrIntegerOrStringError(arg_name, arg_type.__name__)
+    return PySparkException(origin=e)
+
+
+def notColumnOrStringError(arg_name: str, arg_type: Type[Any]) -> "PySparkException":
+    pyspark_errors = _get_pyspark_errors()
+    e = pyspark_errors.notColumnOrStringError(arg_name, arg_type.__name__)
+    return PySparkException(origin=e)
+
+
+def unsupportedParamTypeForHigherOrderFunctionError(func_name: str) -> "PySparkException":
+    pyspark_errors = _get_pyspark_errors()
+    e = pyspark_errors.unsupportedParamTypeForHigherOrderFunctionError(func_name)
+    return PySparkException(origin=e)
+
+
+def invalidNumberOfColumnsError(func_name: str) -> "PySparkException":

Review Comment:
   shall we probably follow snake naming rules here since these are internal



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1057030425


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1046,6 +1046,63 @@
       "Protobuf type not yet supported: <protobufType>."
     ]
   },
+  "PYSPARK" : {
+    "message" : [
+      ""
+    ],
+    "subClass" : {
+      "COLUMN_IN_LIST" : {
+        "message" : [
+          "<funcName> does not allow a column in a list"
+        ]
+      },
+      "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
+        "message" : [
+          "Function `<funcName>` should return Column, got <returnType>"
+        ]
+      },
+      "NOT_A_COLUMN" : {
+        "message" : [
+          "Argument `<argName>` should be a column, got <argType>."
+        ]
+      },
+      "NOT_A_STRING" : {
+        "message" : [
+          "Argument `<argName>` should be a string, got <argType>."
+        ]
+      },
+      "NOT_COLUMN_OR_INTEGER" : {
+        "message" : [
+          "Argument `<argName>` should be a column or integer, got <argType>."

Review Comment:
   Yeah, there is a framework on JVM side to handle this logic.
   These parameters from `error-classes.json` is constructed from `SparkThrowable` and `SparkThrowableHelper`.
   And, all Exceptions should inherits the `SparkThrowable` for leveraging this centralized error message framework.
   You can check the [Guidelines](https://github.com/apache/spark/tree/master/core/src/main/resources/error) for more detail :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zhengruifeng commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
zhengruifeng commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1057071624


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors

Review Comment:
   if we are going to support Spark Connect, I guess we can not invoke the JVM side in Python Client



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1057097518


##########
sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala:
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.python.errors
+
+import org.apache.spark._
+
+/**
+ * Object for grouping error messages from exceptions thrown by PySpark.
+ */
+private[python] object PySparkErrors {

Review Comment:
   Wouldn't it be better to simply leverage the intent of the error classes then trying to push yet another link to the JVM? Why not just add an error class json file in PySpark?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on PR #39137:
URL: https://github.com/apache/spark/pull/39137#issuecomment-1365099279

   > * I worried that maybe it would not be easy to maintenance when the rules on one side (PySpark vs JVM) were arbitrarily changed in the future. So, I wanted to manage all errors in a single error class file(error-class.json) across the entire Apache Spark project to reduce the management cost.
   
   I think it's fine to reuse the `error-class.json` file. Let's figure out a way to bundle it as part of the PySpark build. If we can't do that, let's add a `python-error-class.json` and verify its integrity in `SparkThrowableSuite` the same way as we do for the `error-class.json`. 
   
   > * I thought I might see an advantage in that we can simply reuse the existing error class as it is without adding a new one when there is a similar error already defined on the JVM side in the future.
   
   I'm not sure how much of a benefit that we get here. For all server side exceptions they're thrown as SparkExceptions anyways and will be properly handled. This is where the `CapturedException` comes in. For client side exceptions, we should use the conceptually same approach but in a language idiomatic way. 
   
   > * Like the functions in `functions.py` , most of PySpark's functions leverage the JVM's logic, so it is assumed that the JVM will run at least once. So I thought that calling the error implemented in is acceptable for the expected overhead.
   
   The functions that are handled in this PR are purely client side and should not require the vm traverse. 
   
   
   Since the primary reason for `SparkThrowableHelper` is to provide a lookup in a JSON file and map parameters correctly, I feel that this behavior should be replicated in all of the clients (PySpark, PySpark with SparkConnect, Scala for Spark Connect etc) but follow the message format.
   
   @HyukjinKwon what is your opinion?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
grundprinzip commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1057107476


##########
python/pyspark/errors/__init__.py:
##########
@@ -0,0 +1,33 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.errors.errors import *  # noqa: F403
+from pyspark.errors.exceptions import PySparkException
+
+__all__ = [  # noqa: F405
+    "PySparkException",
+    "columnInListError",
+    "higherOrderFunctionShouldReturnColumnError",
+    "notColumnError",

Review Comment:
   Exporting a function that returns an instance of an error seems weird and indicates that the constructor is not well designed. 



##########
python/pyspark/errors/exceptions.py:
##########
@@ -0,0 +1,92 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import cast, Optional
+
+from pyspark import SparkContext
+
+from py4j.java_gateway import JavaObject, is_instance_of
+
+
+class PySparkException(Exception):

Review Comment:
   Given that Spark Connect does not use the JVM shouldn't most abstract error be one without JVM?



##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors
+
+
+def column_in_list_error(func_name: str) -> "PySparkException":
+    pyspark_errors = _get_pyspark_errors()
+    e = pyspark_errors.columnInListError(func_name)

Review Comment:
   Invoking a function on a Scala object just to access a JSON file feels wrong to me.



##########
python/pyspark/sql/utils.py:
##########
@@ -73,39 +74,6 @@ def __init__(
             self.cause = convert_exception(origin.getCause())
         self._origin = origin
 
-    def __str__(self) -> str:

Review Comment:
   I think it might make sense to move this back. The captured exception is actually thrown from the JVM and thus the place where a JVM backtrace is actually present. So it much rather belongs here than in the parent class. 



##########
python/pyspark/errors/exceptions.py:
##########
@@ -0,0 +1,92 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import cast, Optional
+
+from pyspark import SparkContext
+
+from py4j.java_gateway import JavaObject, is_instance_of
+
+
+class PySparkException(Exception):
+    """
+    Base Exception for handling exceptions generated from PySpark.
+    """
+
+    def __init__(
+        self,
+        desc: Optional[str] = None,
+        stackTrace: Optional[str] = None,
+        origin: Optional[JavaObject] = None,
+    ):
+        # desc & stackTrace vs origin are mutually exclusive.
+        assert (origin is not None and desc is None and stackTrace is None) or (
+            origin is None and desc is not None and stackTrace is not None
+        )
+
+        self.desc = desc if desc is not None else cast(JavaObject, origin).getMessage()
+        assert SparkContext._jvm is not None
+        self.stackTrace = (
+            stackTrace
+            if stackTrace is not None
+            else (SparkContext._jvm.org.apache.spark.util.Utils.exceptionString(origin))
+        )
+        self._origin = origin
+
+    def getMessageParameters(self) -> Optional[dict]:
+        assert SparkContext._gateway is not None
+
+        gw = SparkContext._gateway
+        if self._origin is not None and is_instance_of(
+            gw, self._origin, "org.apache.spark.SparkThrowable"
+        ):
+            return self._origin.getMessageParameters()
+        else:
+            return None
+
+    def __str__(self) -> str:
+        assert SparkContext._jvm is not None
+
+        jvm = SparkContext._jvm
+        sql_conf = jvm.org.apache.spark.sql.internal.SQLConf.get()
+        debug_enabled = sql_conf.pysparkJVMStacktraceEnabled()
+        desc = self.desc
+        if debug_enabled:
+            desc = desc + "\n\nJVM stacktrace:\n%s" % self.stackTrace

Review Comment:
   Why is this a JVM backtrace?



##########
python/pyspark/sql/functions.py:
##########
@@ -172,7 +184,7 @@ def lit(col: Any) -> Column:
         return col
     elif isinstance(col, list):
         if any(isinstance(c, Column) for c in col):
-            raise ValueError("lit does not allow a column in a list")
+            raise column_in_list_error(func_name="lit")

Review Comment:
   I find the readability is reduced with those function names. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1057031297


##########
sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala:
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.python.errors
+
+import org.apache.spark._
+
+/**
+ * Object for grouping error messages from exceptions thrown by PySpark.
+ */
+private[python] object PySparkErrors {

Review Comment:
   Because we want to leverage the centralized existing error framework and its error classes from JVM side.
   You can refer to [QueryExecutionErrors.scala](https://github.com/apache/spark/tree/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala) for example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1057031297


##########
sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala:
##########
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.python.errors
+
+import org.apache.spark._
+
+/**
+ * Object for grouping error messages from exceptions thrown by PySpark.
+ */
+private[python] object PySparkErrors {

Review Comment:
   Because we want to leverage the centralized existing error framework and its error classes from JVM side.
   You can refer to [https://github.com/apache/spark/tree/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala](QueryExecutionErrors.scala) for example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053437112


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1046,6 +1046,63 @@
       "Protobuf type not yet supported: <protobufType>."
     ]
   },
+  "PYSPARK" : {
+    "message" : [
+      ""
+    ],
+    "subClass" : {
+      "COLUMN_IN_LIST" : {
+        "message" : [
+          "<funcName> does not allow a column in a list"
+        ]
+      },
+      "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
+        "message" : [
+          "Function '<funcName>' should return Column, got <returnType>"

Review Comment:
   `'<funcName>'` should be quoted by backticks. 



##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1046,6 +1046,63 @@
       "Protobuf type not yet supported: <protobufType>."
     ]
   },
+  "PYSPARK" : {
+    "message" : [
+      ""
+    ],
+    "subClass" : {
+      "COLUMN_IN_LIST" : {
+        "message" : [
+          "<funcName> does not allow a column in a list"
+        ]
+      },
+      "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
+        "message" : [
+          "Function '<funcName>' should return Column, got <returnType>"
+        ]
+      },
+      "NOT_A_COLUMN" : {
+        "message" : [
+          "Argument '<argName>' should be a column, got '<argType>'."
+        ]
+      },
+      "NOT_A_STRING" : {
+        "message" : [
+          "Argument '<argName>' should be a string, got '<argType>'."

Review Comment:
   nit: Use backticks.



##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1046,6 +1046,63 @@
       "Protobuf type not yet supported: <protobufType>."
     ]
   },
+  "PYSPARK" : {
+    "message" : [
+      ""
+    ],
+    "subClass" : {
+      "COLUMN_IN_LIST" : {
+        "message" : [
+          "<funcName> does not allow a column in a list"
+        ]
+      },
+      "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
+        "message" : [
+          "Function '<funcName>' should return Column, got <returnType>"
+        ]
+      },
+      "NOT_A_COLUMN" : {
+        "message" : [
+          "Argument '<argName>' should be a column, got '<argType>'."

Review Comment:
   Please, remove `''` around `argName`, and use backticks, see `toSQLId()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic closed pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic closed pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes
URL: https://github.com/apache/spark/pull/39137


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053902730


##########
python/pyspark/errors/exceptions.py:
##########
@@ -0,0 +1,24 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.sql.utils import CapturedException
+
+
+class PySparkException(CapturedException):

Review Comment:
   That makes sense, `PySparkException` should be higher up in the hierarchy than `CapturedException`.
   
   Thanks for catching this! Let me update it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053903131


##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1046,6 +1046,63 @@
       "Protobuf type not yet supported: <protobufType>."
     ]
   },
+  "PYSPARK" : {
+    "message" : [
+      ""
+    ],
+    "subClass" : {
+      "COLUMN_IN_LIST" : {
+        "message" : [
+          "<funcName> does not allow a column in a list"
+        ]
+      },
+      "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
+        "message" : [
+          "Function '<funcName>' should return Column, got <returnType>"

Review Comment:
   Thanks for the comments, @MaxGekk . Let me address the comments soon.



##########
core/src/main/resources/error/error-classes.json:
##########
@@ -1046,6 +1046,63 @@
       "Protobuf type not yet supported: <protobufType>."
     ]
   },
+  "PYSPARK" : {
+    "message" : [
+      ""
+    ],
+    "subClass" : {
+      "COLUMN_IN_LIST" : {
+        "message" : [
+          "<funcName> does not allow a column in a list"
+        ]
+      },
+      "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN" : {
+        "message" : [
+          "Function '<funcName>' should return Column, got <returnType>"

Review Comment:
   Thanks for the comments, @MaxGekk . Let me address them soon.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053170299


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors
+
+
+def columnInListError(func_name: str) -> "PySparkException":

Review Comment:
   The name of errors should be sufficiently descriptive of the error.
   
   I would appreciate for any comment to improvement the naming.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053172074


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors
+
+
+def columnInListError(func_name: str) -> "PySparkException":

Review Comment:
   I follows the camelCase naming rule here, for error names to facilitate matching with errors defined on the JVM side (`sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala`)



##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors
+
+
+def columnInListError(func_name: str) -> "PySparkException":

Review Comment:
   I follows the camelCase naming rule, for error names to facilitate matching with errors defined on the JVM side (`sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1053172074


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors
+
+
+def columnInListError(func_name: str) -> "PySparkException":

Review Comment:
   I following the camelCase naming rule for errors to facilitate matching with errors defined on the JVM side (`sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala`)



##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors
+
+
+def columnInListError(func_name: str) -> "PySparkException":

Review Comment:
   I follows the camelCase naming rule for errors to facilitate matching with errors defined on the JVM side (`sql/catalyst/src/main/scala/org/apache/spark/python/errors/PySparkErrors.scala`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39137: [SPARK-41586][SPARK-41598][PYTHON] Introduce PySpark errors package and error classes

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39137:
URL: https://github.com/apache/spark/pull/39137#discussion_r1057091531


##########
python/pyspark/errors/errors.py:
##########
@@ -0,0 +1,93 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Union, Any, Type
+from pyspark.errors.exceptions import PySparkException
+from py4j.java_gateway import JavaClass
+
+
+def _get_pyspark_errors() -> JavaClass:
+    from pyspark.sql import SparkSession
+
+    spark = SparkSession._getActiveSessionOrCreate()
+    assert spark._jvm is not None
+    return spark._jvm.org.apache.spark.python.errors.PySparkErrors

Review Comment:
   And that's why the current CI is failing 😂 
   I'm taking a look at this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org