You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/19 11:47:14 UTC

[GitHub] [spark] itholic opened a new pull request, #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

itholic opened a new pull request, #39128:
URL: https://github.com/apache/spark/pull/39128

   ### What changes were proposed in this pull request?
   
   This PR proposes to introduce `pyspark.errors` and error classes to unifying & improving errors generated by PySpark under a single path.
   
   This PR includes the changes below:
   - `python/pyspark/__init__.py`
     - Add new class `PySparkException`.
     - Add PySpark-specific errors that raise `PySparkException`.
   - `python/pyspark/sql/functions.py`
     - Migrate Python built-in exceptions to PySpark-specific errors.
   - `pyspark/errors/error_classes.py`
     - Add error classes to identify the PySpark-specific errors.
   - `python/pyspark/testing/utils.py`
     - Add `checkError` to test errors with `error_class` and `message_parameter` instead of error message.
   - `python/pyspark/sql/tests/test_functions.py`
     - Add & modify the tests by using `checkError`.
   
   This is an initial PR for introducing error framework for PySpark to facilitate the error management and provide better/consistent error messages to users.
   
   While such an active work is being done on the [SQL side to improve error messages](https://issues.apache.org/jira/browse/SPARK-37935), so far there is no work to improve error messages in PySpark.
   
   Next up for this PR include:
   - Migrate more Python built-in exceptions generated by driver side into PySpark-specific errors.
   - Migrate the errors generated by `Py4J` into PySpark-specific errors.
   - Migrate the errors generated by Python worker side into PySpark-specific errors.
   - Migrate more error tests into tests using `checkError`.
   - Currently all PySpark-specific errors are defined as `PySparkException` class. As the number of PySpark-specific errors increases in the future, it may be necessary to further refine the `PySparkException` into multiple categories
   
   ### Why are the changes needed?
   
   Centralizing error messages & introducing identified error class provides the following benefits:
   - Errors are searchable via the unique class names and properly classified.
   - Reduce the cost of future maintenance for PySpark errors.
   - Provide consistent & actionable error messages to users.
   - Facilitates translating error messages into different languages.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, but only for error message. No API changes at all.
   
   For example,
   
   **Before**
   ```python
   >>> from pyspark.sql import functions as F
   >>> F.window("date", 5)
   Traceback (most recent call last):
   ...
   TypeError: windowDuration should be provided as a string
   ```
   
   **After**
   ```python
   >>> from pyspark.sql import functions as F
   >>> F.window("date", 5)
   Traceback (most recent call last):
   ...
   pyspark.errors.PySparkException: [NOT_A_STRING] Argument 'windowDuration' should be a string, got 'int'.
   ```
   
   ### How was this patch tested?
   
   By adding unittests, and existing static analysis tools (`dev/lint-python`)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39128:
URL: https://github.com/apache/spark/pull/39128#discussion_r1052751875


##########
python/pyspark/testing/utils.py:
##########
@@ -138,6 +140,32 @@ def setUpClass(cls):
     def tearDownClass(cls):
         cls.sc.stop()
 
+    def checkError(

Review Comment:
   `check_error`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39128:
URL: https://github.com/apache/spark/pull/39128#discussion_r1052123143


##########
python/pyspark/errors/error_classes.py:
##########
@@ -0,0 +1,30 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+ERROR_CLASSES = {
+    "COLUMN_IN_LIST": lambda func_name: f"{func_name} does not allow a column in a list",
+    "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN": lambda func_name, return_type: f"Function '{func_name}' should return Column, got {return_type}",
+    "NOT_A_COLUMN": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a column, got '{arg_type}'.",
+    "NOT_A_STRING": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a string, got '{arg_type}'.",
+    "NOT_COLUMN_OR_INTEGER": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a column or integer, got '{arg_type}'.",
+    "NOT_COLUMN_OR_INTEGER_OR_STRING": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a column or integer or string, got '{arg_type}'.",
+    "NOT_COLUMN_OR_STRING": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a column or string, got '{arg_type}'.",

Review Comment:
   Such errors of similar categories may need to be grouped together and managed as a sub-error class.
   
   For example, `NOT_A_COLUMN `, `NOT_A_STRING `, `NOT_COLUMN_OR_INTEGER `, `NOT_COLUMN_OR_INTEGER_OR_STRING ` and `NOT_COLUMN_OR_STRING ` could be defined as `INVALID_TYPE_FOR_ARGUMENT` with one parent error class.
   
   In the current PR, I want to focus on the discussion of the overall idea and structure, so let me follow-up for sub-error classes by adding more error classes later if needed.



##########
python/pyspark/errors/error_classes.py:
##########
@@ -0,0 +1,30 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+ERROR_CLASSES = {
+    "COLUMN_IN_LIST": lambda func_name: f"{func_name} does not allow a column in a list",
+    "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN": lambda func_name, return_type: f"Function '{func_name}' should return Column, got {return_type}",
+    "NOT_A_COLUMN": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a column, got '{arg_type}'.",
+    "NOT_A_STRING": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a string, got '{arg_type}'.",
+    "NOT_COLUMN_OR_INTEGER": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a column or integer, got '{arg_type}'.",
+    "NOT_COLUMN_OR_INTEGER_OR_STRING": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a column or integer or string, got '{arg_type}'.",
+    "NOT_COLUMN_OR_STRING": lambda arg_name, arg_type: f"Argument '{arg_name}' should be a column or string, got '{arg_type}'.",

Review Comment:
   Such errors of similar categories may need to be grouped together and managed as a sub-error class.
   
   For example, `NOT_A_COLUMN `, `NOT_A_STRING `, `NOT_COLUMN_OR_INTEGER `, `NOT_COLUMN_OR_INTEGER_OR_STRING ` and `NOT_COLUMN_OR_STRING ` could be defined as `INVALID_TYPE_FOR_ARGUMENT` with one parent error class.
   
   In the current PR, I want to focus on the quick discussion of the overall idea and structure, so let me follow-up for sub-error classes by adding more error classes later if needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39128:
URL: https://github.com/apache/spark/pull/39128#discussion_r1052751097


##########
python/pyspark/errors/__init__.py:
##########
@@ -0,0 +1,140 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Dict, Optional, Union, Any, Type
+from pyspark.errors.error_classes import ERROR_CLASSES
+
+
+class PySparkException(Exception):

Review Comment:
   Let's move all these into a separate file under `pyspark/errors` (e.g., `pyspark/errors/exceptions.py`), and only put imports here in `__init__.py`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39128:
URL: https://github.com/apache/spark/pull/39128#discussion_r1052115930


##########
python/pyspark/errors/__init__.py:
##########
@@ -0,0 +1,140 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Dict, Optional, Union, Any, Type
+from pyspark.errors.error_classes import ERROR_CLASSES
+
+
+class PySparkException(Exception):

Review Comment:
   As mentioned in PR description, currently all PySpark-specific errors are defined as `PySparkException` class.
   It might be necessary to refine the `PySparkException` into multiple categories as the number of PySpark-specific errors increases in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39128:
URL: https://github.com/apache/spark/pull/39128#discussion_r1052750600


##########
python/pyspark/errors/__init__.py:
##########
@@ -0,0 +1,140 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Dict, Optional, Union, Any, Type
+from pyspark.errors.error_classes import ERROR_CLASSES
+
+
+class PySparkException(Exception):

Review Comment:
   Should we integrate this to the exceptions defined under `pyspark.sql.utils`?



##########
python/pyspark/errors/__init__.py:
##########
@@ -0,0 +1,140 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Dict, Optional, Union, Any, Type
+from pyspark.errors.error_classes import ERROR_CLASSES
+
+
+class PySparkException(Exception):
+    """
+    Base Exception for handling the errors generated by PySpark
+    """
+
+    def __init__(self, error_class: str, message_parameters: Optional[Dict[str, str]] = None):
+        self._verify_error_class(error_class)
+        self._error_class = error_class
+
+        self._error_message_format = ERROR_CLASSES[error_class]
+
+        self._verify_message_parameters(message_parameters)
+        self._message_parameters = message_parameters
+
+    def _verify_error_class(self, error_class: str) -> None:
+        assert (
+            error_class in ERROR_CLASSES
+        ), f"{error_class} is not in the list of error classes: {list(ERROR_CLASSES.keys())}"
+
+    def _verify_message_parameters(
+        self, message_parameters: Optional[Dict[str, str]] = None
+    ) -> None:
+        required = set(self._error_message_format.__code__.co_varnames)
+        given = set() if message_parameters is None else set(message_parameters.keys())
+        assert given == required, f"Given message parameters: {given} , but {required} required"
+
+    def getErrorClass(self) -> str:
+        return self._error_class
+
+    def getMessageParameters(self) -> Optional[Dict[str, str]]:
+        return self._message_parameters
+
+    def getErrorMessage(self) -> str:
+        if self._message_parameters is None:
+            message = self._error_message_format()  # type: ignore[operator]
+        else:
+            message = self._error_message_format(
+                *self._message_parameters.values()
+            )  # type: ignore[operator]
+
+        return message
+
+    def __str__(self) -> str:
+        # The user-facing error message is contains error class and error message
+        # e.g. "[WRONG_NUM_COLUMNS] 'greatest' should take at least two columns"
+        return f"[{self.getErrorClass()}] {self.getErrorMessage()}"
+
+
+def notColumnOrStringError(arg_name: str, arg_type: Type[Any]) -> "PySparkException":
+    return PySparkException(
+        error_class="NOT_COLUMN_OR_STRING",
+        message_parameters={"arg_name": arg_name, "arg_type": arg_type.__name__},
+    )
+
+
+def notColumnOrIntegerError(arg_name: str, arg_type: Type[Any]) -> "PySparkException":

Review Comment:
   Can we follow snake naming rule since these are all internals?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39128:
URL: https://github.com/apache/spark/pull/39128#discussion_r1052752065


##########
python/pyspark/sql/functions.py:
##########
@@ -8122,15 +8130,13 @@ def _get_lambda_parameters(f: Callable) -> ValuesView[inspect.Parameter]:
     # Validate that
     # function arity is between 1 and 3
     if not (1 <= len(parameters) <= 3):
-        raise ValueError(
-            "f should take between 1 and 3 arguments, but provided function takes {}".format(
-                len(parameters)
-            )
+        raise invalidHigherOrderFunctionArgumentNumberError(
+            func_name=f.__name__, num_args=len(parameters)
         )
 
     # and all arguments can be used as positional
     if not all(p.kind in supported_parameter_types for p in parameters):
-        raise ValueError("f should use only POSITIONAL or POSITIONAL OR KEYWORD arguments")
+        raise invalidParameterTypeForHigherOrderFunctionError(func_name=f.__name__)

Review Comment:
   If you plan to do this all in other places, please file an umbrella JIRA



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
itholic commented on PR #39128:
URL: https://github.com/apache/spark/pull/39128#issuecomment-1358810969

   Let me close it for now, and re-create the PR to change the logic to re-use JVM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic closed pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
itholic closed pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.
URL: https://github.com/apache/spark/pull/39128


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #39128: [SPARK-41586][Python] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
itholic commented on code in PR #39128:
URL: https://github.com/apache/spark/pull/39128#discussion_r1052117811


##########
python/pyspark/errors/__init__.py:
##########
@@ -0,0 +1,140 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from typing import Dict, Optional, Union, Any, Type
+from pyspark.errors.error_classes import ERROR_CLASSES
+
+
+class PySparkException(Exception):
+    """
+    Base Exception for handling the errors generated by PySpark
+    """
+
+    def __init__(self, error_class: str, message_parameters: Optional[Dict[str, str]] = None):
+        self._verify_error_class(error_class)
+        self._error_class = error_class
+
+        self._error_message_format = ERROR_CLASSES[error_class]
+
+        self._verify_message_parameters(message_parameters)
+        self._message_parameters = message_parameters
+
+    def _verify_error_class(self, error_class: str) -> None:
+        assert (
+            error_class in ERROR_CLASSES
+        ), f"{error_class} is not in the list of error classes: {list(ERROR_CLASSES.keys())}"
+
+    def _verify_message_parameters(
+        self, message_parameters: Optional[Dict[str, str]] = None
+    ) -> None:
+        required = set(self._error_message_format.__code__.co_varnames)
+        given = set() if message_parameters is None else set(message_parameters.keys())
+        assert given == required, f"Given message parameters: {given} , but {required} required"
+
+    def getErrorClass(self) -> str:
+        return self._error_class
+
+    def getMessageParameters(self) -> Optional[Dict[str, str]]:
+        return self._message_parameters
+
+    def getErrorMessage(self) -> str:
+        if self._message_parameters is None:
+            message = self._error_message_format()  # type: ignore[operator]
+        else:
+            message = self._error_message_format(
+                *self._message_parameters.values()
+            )  # type: ignore[operator]
+
+        return message
+
+    def __str__(self) -> str:
+        # The user-facing error message is contains error class and error message
+        # e.g. "[WRONG_NUM_COLUMNS] 'greatest' should take at least two columns"
+        return f"[{self.getErrorClass()}] {self.getErrorMessage()}"
+
+
+def notColumnOrStringError(arg_name: str, arg_type: Type[Any]) -> "PySparkException":

Review Comment:
   Error names should be sufficiently descriptive of the error.
   
   I would appreciate for any comment to improvement the naming.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39128: [SPARK-41586][PYTHON] Introduce new PySpark package: `pyspark.errors` and error classes.

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #39128:
URL: https://github.com/apache/spark/pull/39128#discussion_r1052751697


##########
python/pyspark/errors/error_classes.py:
##########
@@ -0,0 +1,30 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+ERROR_CLASSES = {

Review Comment:
   These files should be in JSON, and provide a user to avoid this before PySpark starting up as we have done in Scala side.
   
   Do you propose a new way to override them in PySpark side by overwriting this in library?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org