You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "nchammas (via GitHub)" <gi...@apache.org> on 2024/01/28 21:43:04 UTC

[PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

nchammas opened a new pull request, #44920:
URL: https://github.com/apache/spark/pull/44920

   ### What changes were proposed in this pull request?
   
   Move PySpark error conditions into a standalone JSON file.
   
   ### Why are the changes needed?
   
   Having the JSON in its own file enables better IDE support for editing and managing the JSON. It also simplified the logic to regenerate the JSON.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing unit tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584104489


##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
     import json
+    from pathlib import Path
     from pyspark.errors import error_classes
 
-    with open("python/pyspark/errors/error_classes.py", "w") as f:
-        error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+    ERRORS_DIR = Path(__file__).parents[1]
 
-ERROR_CLASSES_MAP = json.loads(ERROR_CLASSES_JSON)
-""" % json.dumps(
-            error_classes.ERROR_CLASSES_MAP, sort_keys=True, indent=2
+    with open(ERRORS_DIR / "error-conditions.json", "w") as f:

Review Comment:
   Hmm, I don't understand the concern. This method here is `_write_self()`. It's for development only. No user will run this when they install Spark, regardless of the installation method. That's what I was saying in my [earlier comment on this method][1].
   
   The real code path we care about is [in `error_classes.py`][2], not `__init__.py`. And this is the code path that I tested in various ways and documented in the PR description.
   
   I tested the zip installation method you were particularly concerned about in point 5:
   
   <img width="400" alt="Screenshot 2024-04-30 at 12 12 25 AM" src="https://github.com/apache/spark/assets/1039369/07884c50-8cd6-4caf-8bb9-b0269f40eb54">
   
   Is there something about that test you think is inadequate?
   
   [1]: https://github.com/apache/spark/pull/44920/files/010714d00b84d7e9edb61170cf35d176cacfb67d#r1470557657
   
   [2]: https://github.com/apache/spark/pull/44920/files/010714d00b84d7e9edb61170cf35d176cacfb67d#diff-2823e146fc0e6bddff3505b5bee6e2b855782d9f71e900e6f9099fc97d1fffa6



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1914109640

   > Is this a reference to this command?
   
   Yes, so you might need to fix the description from https://github.com/apache/spark/blob/master/python/pyspark/errors_doc_gen.py#L44.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon closed pull request #44920: [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file
URL: https://github.com/apache/spark/pull/44920


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1586979834


##########
python/MANIFEST.in:
##########
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   @nchammas Seems like this ending up with adding all tests as well. Could we just include that json file alone?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1914877483

   I think we should wait for the conversation in SPARK-46810 to resolve before merging this in.
   
   But apart from that, is there anything more you'd like me to check here? Do you approve of the use of `importlib.resources` (which I think is the "correct" solution in our case)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1916097263

   Converting to draft until [SPARK-46810](https://issues.apache.org/jira/browse/SPARK-46810) is resolved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584127828


##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1160 +15,15 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-    "message": [
-      "An application name must be set in your configuration."
-    ]
-  },
-  "ARGUMENT_REQUIRED": {
-    "message": [
-      "Argument `<arg_name>` is required when <condition>."
-    ]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-    "message": [
-      "Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT."
-    ]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-    "message": [
-      "Attribute `<attr_name>` in provided object `<obj_name>` is not callable."
-    ]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported."
-    ]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-    "message": [
-      "Length mismatch: Expected axis has <expected_length> element, new values have <actual_length> elements."
-    ]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-    "message": [
-      "Broadcast variable `<variable>` not loaded."
-    ]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-    "message": [
-      "Not supported to call `<func_name>` before initialize <object>."
-    ]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-    "message": [
-      "`<data_type>` can not accept object `<obj_name>` in type `<obj_type>`."
-    ]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-    "message": [
-      "Dunder(double underscore) attribute is for internal use only."
-    ]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-    "message": [
-      "Cannot apply 'in' operator against a column: please use 'contains' in a string column or 'array_contains' function for an array column."
-    ]
-  },
-  "CANNOT_BE_EMPTY": {
-    "message": [
-      "At least one <item> must be specified."
-    ]
-  },
-  "CANNOT_BE_NONE": {
-    "message": [
-      "Argument `<arg_name>` cannot be None."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-    "message": [
-      "Spark Connect server cannot be configured: Existing [<existing_url>], New [<new_url>]."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-    "message": [
-      "Spark Connect server and Spark master cannot be configured together: Spark master [<master_url>], Spark Connect [<connect_url>]."
-    ]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-    "message": [
-      "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions."
-    ]
-  },
-  "CANNOT_CONVERT_TYPE": {
-    "message": [
-      "Cannot convert <from_type> into <to_type>."
-    ]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-    "message": [
-      "Some of types cannot be determined after inferring."
-    ]
-  },
-  "CANNOT_GET_BATCH_ID": {
-    "message": [
-      "Could not get batch id from <obj_name>."
-    ]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-    "message": [
-      "Can not infer Array Type from a list with None as the first element."
-    ]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-    "message": [
-      "Can not infer schema from an empty dataset."
-    ]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-    "message": [
-      "Can not infer schema for type: `<data_type>`."
-    ]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-    "message": [
-      "Unable to infer the type of the field `<field_name>`."
-    ]
-  },
-  "CANNOT_MERGE_TYPE": {
-    "message": [
-      "Can not merge type `<data_type1>` and `<data_type2>`."
-    ]
-  },
-  "CANNOT_OPEN_SOCKET": {
-    "message": [
-      "Can not open socket: <errors>."
-    ]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-    "message": [
-      "Unable to parse datatype. <msg>."
-    ]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-    "message": [
-      "Metadata can only be provided for a single column."
-    ]
-  },
-  "CANNOT_SET_TOGETHER": {
-    "message": [
-      "<arg_list> should not be set together."
-    ]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-    "message": [
-      "returnType can not be specified when `<arg_name>` is a user-defined function, but got <return_type>."
-    ]
-  },
-  "CANNOT_WITHOUT": {
-    "message": [
-      "Cannot <condition1> without <condition2>."
-    ]
-  },
-  "COLUMN_IN_LIST": {
-    "message": [
-      "`<func_name>` does not allow a Column in a list."
-    ]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-    "message": [
-      "Only one Spark Connect client URL can be set; however, got a different URL [<new_url>] from the existing [<existing_url>]."
-    ]
-  },
-  "CONNECT_URL_NOT_SET": {
-    "message": [
-      "Cannot create a Spark Connect session because the Spark Connect remote URL has not been set. Please define the remote URL by setting either the 'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-    ]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."
-    ]
-  },
-  "CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT": {
-    "message": [
-      "Remote client cannot create a SparkContext. Create SparkSession instead."
-    ]
-  },
-  "DATA_SOURCE_CREATE_ERROR": {
-    "message": [
-      "Failed to create python data source instance, error: <error>."
-    ]
-  },
-  "DATA_SOURCE_INVALID_RETURN_TYPE": {
-    "message": [
-      "Unsupported return type ('<type>') from Python data source '<name>'. Expected types: <supported_types>."
-    ]
-  },
-  "DATA_SOURCE_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "Return schema mismatch in the result from 'read' method. Expected: <expected> columns, Found: <actual> columns. Make sure the returned values match the required output schema."
-    ]
-  },
-  "DATA_SOURCE_TYPE_MISMATCH": {
-    "message": [
-      "Expected <expected>, but got <actual>."
-    ]
-  },
-  "DIFFERENT_PANDAS_DATAFRAME": {
-    "message": [
-      "DataFrames are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_INDEX": {
-    "message": [
-      "Indices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_MULTIINDEX": {
-    "message": [
-      "MultiIndices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_SERIES": {
-    "message": [
-      "Series are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_ROWS": {
-    "message": [
-      "<error_msg>"
-    ]
-  },
-  "DIFFERENT_SCHEMA": {
-    "message": [
-      "Schemas do not match.",
-      "--- actual",
-      "+++ expected",
-      "<error_msg>"
-    ]
-  },
-  "DISALLOWED_TYPE_FOR_CONTAINER": {
-    "message": [
-      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <item_type>"
-    ]
-  },
-  "DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT": {
-    "message": [
-      "Duplicated field names in Arrow Struct are not allowed, got <field_names>"
-    ]
-  },
-  "ERROR_OCCURRED_WHILE_CALLING": {
-    "message": [
-      "An error occurred while calling <func_name>: <error_msg>."
-    ]
-  },
-  "FIELD_DATA_TYPE_UNACCEPTABLE": {
-    "message": [
-      "<data_type> can not accept object <obj> in type <obj_type>."
-    ]
-  },
-  "FIELD_DATA_TYPE_UNACCEPTABLE_WITH_NAME": {
-    "message": [
-      "<field_name>: <data_type> can not accept object <obj> in type <obj_type>."
-    ]
-  },
-  "FIELD_NOT_NULLABLE": {
-    "message": [
-      "Field is not nullable, but got None."
-    ]
-  },
-  "FIELD_NOT_NULLABLE_WITH_NAME": {
-    "message": [
-      "<field_name>: This field is not nullable, but got None."
-    ]
-  },
-  "FIELD_STRUCT_LENGTH_MISMATCH": {
-    "message": [
-      "Length of object (<object_length>) does not match with length of fields (<field_length>)."
-    ]
-  },
-  "FIELD_STRUCT_LENGTH_MISMATCH_WITH_NAME": {
-    "message": [
-      "<field_name>: Length of object (<object_length>) does not match with length of fields (<field_length>)."
-    ]
-  },
-  "FIELD_TYPE_MISMATCH": {
-    "message": [
-      "<obj> is not an instance of type <data_type>."
-    ]
-  },
-  "FIELD_TYPE_MISMATCH_WITH_NAME": {
-    "message": [
-      "<field_name>: <obj> is not an instance of type <data_type>."
-    ]
-  },
-  "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN": {
-    "message": [
-      "Function `<func_name>` should return Column, got <return_type>."
-    ]
-  },
-  "INCORRECT_CONF_FOR_PROFILE": {
-    "message": [
-      "`spark.python.profile` or `spark.python.profile.memory` configuration",
-      " must be set to `true` to enable Python profile."
-    ]
-  },
-  "INDEX_NOT_POSITIVE": {
-    "message": [
-      "Index must be positive, got '<index>'."
-    ]
-  },
-  "INDEX_OUT_OF_RANGE": {
-    "message": [
-      "<arg_name> index out of range, got '<index>'."
-    ]
-  },
-  "INVALID_ARROW_UDTF_RETURN_TYPE": {
-    "message": [
-      "The return type of the arrow-optimized Python UDTF should be of type 'pandas.DataFrame', but the '<func>' method returned a value of type <return_type> with value: <value>."
-    ]
-  },
-  "INVALID_BROADCAST_OPERATION": {
-    "message": [
-      "Broadcast can only be <operation> in driver."
-    ]
-  },
-  "INVALID_CALL_ON_UNRESOLVED_OBJECT": {
-    "message": [
-      "Invalid call to `<func_name>` on unresolved object."
-    ]
-  },
-  "INVALID_CONNECT_URL": {
-    "message": [
-      "Invalid URL for Spark Connect: <detail>"
-    ]
-  },
-  "INVALID_INTERVAL_CASTING": {
-    "message": [
-      "Interval <start_field> to <end_field> is invalid."
-    ]
-  },
-  "INVALID_ITEM_FOR_CONTAINER": {
-    "message": [
-      "All items in `<arg_name>` should be in <allowed_types>, got <item_type>."
-    ]
-  },
-  "INVALID_MULTIPLE_ARGUMENT_CONDITIONS": {
-    "message": [
-      "[{arg_names}] cannot be <condition>."
-    ]
-  },
-  "INVALID_NDARRAY_DIMENSION": {
-    "message": [
-      "NumPy array input should be of <dimensions> dimensions."
-    ]
-  },
-  "INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP": {
-    "message": [
-      "Invalid number of dataframes in group <dataframes_in_group>."
-    ]
-  },
-  "INVALID_PANDAS_UDF": {
-    "message": [
-      "Invalid function: <detail>"
-    ]
-  },
-  "INVALID_PANDAS_UDF_TYPE": {
-    "message": [
-      "`<arg_name>` should be one of the values from PandasUDFType, got <arg_type>"
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_ARROW_UDF": {
-    "message": [
-      "Grouped and Cogrouped map Arrow UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_PANDAS_UDF": {
-    "message": [
-      "Pandas UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_SESSION_UUID_ID": {
-    "message": [
-      "Parameter value <arg_name> must be a valid UUID format: <origin>"
-    ]
-  },
-  "INVALID_TIMEOUT_TIMESTAMP": {
-    "message": [
-      "Timeout timestamp (<timestamp>) cannot be earlier than the current watermark (<watermark>)."
-    ]
-  },
-  "INVALID_TYPE": {
-    "message": [
-      "Argument `<arg_name>` should not be a <arg_type>."
-    ]
-  },
-  "INVALID_TYPENAME_CALL": {
-    "message": [
-      "StructField does not have typeName. Use typeName on its type explicitly instead."
-    ]
-  },
-  "INVALID_TYPE_DF_EQUALITY_ARG": {
-    "message": [
-      "Expected type <expected_type> for `<arg_name>` but got type <actual_type>."
-    ]
-  },
-  "INVALID_UDF_EVAL_TYPE": {
-    "message": [
-      "Eval type for UDF must be <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_BOTH_RETURN_TYPE_AND_ANALYZE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It has both its return type and an 'analyze' attribute. Please make it have one of either the return type or the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_EVAL_TYPE": {
-    "message": [
-      "The eval type for the UDTF '<name>' is invalid. It must be one of <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_HANDLER_TYPE": {
-    "message": [
-      "The UDTF is invalid. The function handler must be a class, but got '<type>'. Please provide a class as the function handler."
-    ]
-  },
-  "INVALID_UDTF_NO_EVAL": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not implement the required 'eval' method. Please implement the 'eval' method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_RETURN_TYPE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not specify its return type or implement the required 'analyze' static method. Please specify the return type or implement the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_WHEN_USAGE": {
-    "message": [
-      "when() can only be applied on a Column previously generated by when() function, and cannot be applied once otherwise() is applied."
-    ]
-  },
-  "INVALID_WINDOW_BOUND_TYPE": {
-    "message": [
-      "Invalid window bound type: <window_bound_type>."
-    ]
-  },
-  "JAVA_GATEWAY_EXITED": {
-    "message": [
-      "Java gateway process exited before sending its port number."
-    ]
-  },
-  "JVM_ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail."
-    ]
-  },
-  "KEY_NOT_EXISTS": {
-    "message": [
-      "Key `<key>` is not exists."
-    ]
-  },
-  "KEY_VALUE_PAIR_REQUIRED": {
-    "message": [
-      "Key-value pair or a list of pairs is required."
-    ]
-  },
-  "LENGTH_SHOULD_BE_THE_SAME": {
-    "message": [
-      "<arg1> and <arg2> should be of the same length, got <arg1_length> and <arg2_length>."
-    ]
-  },
-  "MASTER_URL_NOT_SET": {
-    "message": [
-      "A master URL must be set in your configuration."
-    ]
-  },
-  "MISSING_LIBRARY_FOR_PROFILER": {
-    "message": [
-      "Install the 'memory_profiler' library in the cluster to enable memory profiling."
-    ]
-  },
-  "MISSING_VALID_PLAN": {
-    "message": [
-      "Argument to <operator> does not contain a valid plan."
-    ]
-  },
-  "MIXED_TYPE_REPLACEMENT": {
-    "message": [
-      "Mixed type replacements are not supported."
-    ]
-  },
-  "NEGATIVE_VALUE": {
-    "message": [
-      "Value for `<arg_name>` must be greater than or equal to 0, got '<arg_value>'."
-    ]
-  },
-  "NOT_BOOL": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float or int, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_LIST_OR_NONE_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int, list, None, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or list, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or str, got <arg_type>."
-    ]
-  },
-  "NOT_CALLABLE": {
-    "message": [
-      "Argument `<arg_name>` should be a callable, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, str or DataType, but got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, float, integer, list or string, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or int, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int, list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a StructType, Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_DATAFRAME": {
-    "message": [
-      "Argument `<arg_name>` should be a DataFrame, got <arg_type>."
-    ]
-  },
-  "NOT_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a DataType or str, got <arg_type>."
-    ]
-  },
-  "NOT_DICT": {
-    "message": [
-      "Argument `<arg_name>` should be a dict, got <arg_type>."
-    ]
-  },
-  "NOT_EXPRESSION": {
-    "message": [
-      "Argument `<arg_name>` should be an Expression, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a float or int, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a float, int, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_IMPLEMENTED": {
-    "message": [
-      "<feature> is not implemented."
-    ]
-  },
-  "NOT_INT": {
-    "message": [
-      "Argument `<arg_name>` should be an int, got <arg_type>."
-    ]
-  },
-  "NOT_INT_OR_SLICE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an int, slice or str, got <arg_type>."
-    ]
-  },
-  "NOT_IN_BARRIER_STAGE": {
-    "message": [
-      "It is not in a barrier stage."
-    ]
-  },
-  "NOT_ITERABLE": {
-    "message": [
-      "<objectName> is not iterable."
-    ]
-  },
-  "NOT_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a list, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a list[float, int], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[str], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_NONE_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a list, None or StructType, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_NUMERIC_COLUMNS": {
-    "message": [
-      "Numeric aggregation function can only be applied on numeric columns, got <invalid_columns>."
-    ]
-  },
-  "NOT_OBSERVATION_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an Observation or str, got <arg_type>."
-    ]
-  },
-  "NOT_SAME_TYPE": {
-    "message": [
-      "Argument `<arg_name1>` and `<arg_name2>` should be the same type, got <arg_type1> and <arg_type2>."
-    ]
-  },
-  "NOT_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a str, got <arg_type>."
-    ]
-  },
-  "NOT_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a struct type, got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_LIST_OF_RDD": {
-    "message": [
-      "Argument `<arg_name>` should be a str or list[RDD], got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a str or struct type, got <arg_type>."
-    ]
-  },
-  "NOT_WINDOWSPEC": {
-    "message": [
-      "Argument `<arg_name>` should be a WindowSpec, got <arg_type>."
-    ]
-  },
-  "NO_ACTIVE_EXCEPTION": {
-    "message": [
-      "No active exception."
-    ]
-  },
-  "NO_ACTIVE_OR_DEFAULT_SESSION": {
-    "message": [
-      "No active or default Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_ACTIVE_SESSION": {
-    "message": [
-      "No active Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_OBSERVE_BEFORE_GET": {
-    "message": [
-      "Should observe by calling `DataFrame.observe` before `get`."
-    ]
-  },
-  "NO_SCHEMA_AND_DRIVER_DEFAULT_SCHEME": {
-    "message": [
-      "Only allows <arg_name> to be a path without scheme, and Spark Driver should use the default scheme to determine the destination file system."
-    ]
-  },
-  "ONLY_ALLOWED_FOR_SINGLE_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` can only be provided for a single column."
-    ]
-  },
-  "ONLY_ALLOW_SINGLE_TRIGGER": {
-    "message": [
-      "Only a single trigger is allowed."
-    ]
-  },
-  "ONLY_SUPPORTED_WITH_SPARK_CONNECT": {
-    "message": [
-      "<feature> is only supported with Spark Connect; however, the current Spark session does not use Spark Connect."
-    ]
-  },
-  "PACKAGE_NOT_INSTALLED": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, it was not found."
-    ]
-  },
-  "PIPE_FUNCTION_EXITED": {
-    "message": [
-      "Pipe function `<func_name>` exited with error code <error_code>."
-    ]
-  },
-  "PYTHON_HASH_SEED_NOT_SET": {
-    "message": [
-      "Randomness of hash of string should be disabled via PYTHONHASHSEED."
-    ]
-  },
-  "PYTHON_STREAMING_DATA_SOURCE_RUNTIME_ERROR": {
-    "message": [
-      "Failed when running Python streaming data source: <msg>"
-    ]
-  },
-  "PYTHON_VERSION_MISMATCH": {
-    "message": [
-      "Python in worker has different version: <worker_version> than that in driver: <driver_version>, PySpark cannot run with different minor versions.",
-      "Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set."
-    ]
-  },
-  "RDD_TRANSFORM_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to broadcast an RDD or reference an RDD from an ",
-      "action or transformation. RDD transformations and actions can only be invoked by the ",
-      "driver, not inside of other transformations; for example, ",
-      "rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values ",
-      "transformation and count action cannot be performed inside of the rdd1.map ",
-      "transformation. For more information, see SPARK-5063."
-    ]
-  },
-  "READ_ONLY": {
-    "message": [
-      "<object> is read-only."
-    ]
-  },
-  "RESPONSE_ALREADY_RECEIVED": {
-    "message": [
-      "OPERATION_NOT_FOUND on the server but responses were already received from it."
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Column names of the returned pyarrow.Table do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Column names of the returned pandas.DataFrame do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: <expected> Actual: <actual>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was <output_length> and the length of input was <input_length>."
-    ]
-  },
-  "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Columns do not match in their data type: <mismatch>."
-    ]
-  },
-  "RETRIES_EXCEEDED": {
-    "message": [
-      "The maximum number of retries has been exceeded."
-    ]
-  },
-  "REUSE_OBSERVATION": {
-    "message": [
-      "An Observation can be used with a DataFrame only once."
-    ]
-  },
-  "SCHEMA_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Result vector from pandas_udf was not the required length: expected <expected>, got <actual>."
-    ]
-  },
-  "SESSION_ALREADY_EXIST": {
-    "message": [
-      "Cannot start a remote Spark session because there is a regular Spark session already running."
-    ]
-  },
-  "SESSION_NEED_CONN_STR_OR_BUILDER": {
-    "message": [
-      "Needs either connection string or channelBuilder (mutually exclusive) to create a new SparkSession."
-    ]
-  },
-  "SESSION_NOT_SAME": {
-    "message": [
-      "Both Datasets must belong to the same SparkSession."
-    ]
-  },
-  "SESSION_OR_CONTEXT_EXISTS": {
-    "message": [
-      "There should not be an existing Spark Session or Spark Context."
-    ]
-  },
-  "SESSION_OR_CONTEXT_NOT_EXISTS": {
-    "message": [
-      "SparkContext or SparkSession should be created first."
-    ]
-  },
-  "SLICE_WITH_STEP": {
-    "message": [
-      "Slice with step is not supported."
-    ]
-  },
-  "STATE_NOT_EXISTS": {
-    "message": [
-      "State is either not defined or has already been removed."
-    ]
-  },
-  "STOP_ITERATION_OCCURRED": {
-    "message": [
-      "Caught StopIteration thrown from user's code; failing the task: <exc>"
-    ]
-  },
-  "STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "pandas iterator UDF should exhaust the input iterator."
-    ]
-  },
-  "STREAMING_CONNECT_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the function `<name>`. If you accessed the Spark session, or a DataFrame defined outside of the function, or any object that contains a Spark session, please be aware that they are not allowed in Spark Connect. For `foreachBatch`, please access the Spark session using `df.sparkSession`, where `df` is the first parameter in your `foreachBatch` function. For `StreamingQueryListener`, please access the Spark session using `self.spark`. For details please check out the PySpark doc for `foreachBatch` and `StreamingQueryListener`."
-    ]
-  },
-  "TEST_CLASS_NOT_COMPILED": {
-    "message": [
-      "<test_class_path> doesn't exist. Spark sql test classes are not compiled."
-    ]
-  },
-  "TOO_MANY_VALUES": {
-    "message": [
-      "Expected <expected> values for `<item>`, got <actual>."
-    ]
-  },
-  "TYPE_HINT_SHOULD_BE_SPECIFIED": {
-    "message": [
-      "Type hints for <target> should be specified; however, got <sig>."
-    ]
-  },
-  "UDF_RETURN_TYPE": {
-    "message": [
-      "Return type of the user-defined function should be <expected>, but is <actual>."
-    ]
-  },
-  "UDTF_ARROW_TYPE_CAST_ERROR": {
-    "message": [
-      "Cannot convert the output value of the column '<col_name>' with type '<col_type>' to the specified return type of the column: '<arrow_type>'. Please check if the data types match and try again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_IMPLEMENTS_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function implements the 'analyze' method, but its constructor has more than two arguments (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, or one 'self' argument plus another argument for the result of the 'analyze' method, and try the query again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function does not implement the 'analyze' method, and its constructor has more than one argument (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, and try the query again."
-    ]
-  },
-  "UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because the function arguments did not match the expected signature of the 'eval' method (<reason>). Please update the query so that this table function call provides arguments matching the expected signature, or else update the table function so that its 'eval' method accepts the provided arguments, and then try the query again."
-    ]
-  },
-  "UDTF_EXEC_ERROR": {
-    "message": [
-      "User defined table function encountered an error in the '<method_name>' method: <error>"
-    ]
-  },
-  "UDTF_INVALID_OUTPUT_ROW_TYPE": {
-    "message": [
-      "The type of an individual output row in the '<func>' method of the UDTF is invalid. Each row should be a tuple, list, or dict, but got '<type>'. Please make sure that the output rows are of the correct type."
-    ]
-  },
-  "UDTF_RETURN_NOT_ITERABLE": {
-    "message": [
-      "The return value of the '<func>' method of the UDTF is invalid. It should be an iterable (e.g., generator or list), but got '<type>'. Please make sure that the UDTF returns one of these types."
-    ]
-  },
-  "UDTF_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "The number of columns in the result does not match the specified schema. Expected column count: <expected>, Actual column count: <actual>. Please make sure the values returned by the '<func>' method have the same number of columns as specified in the output schema."
-    ]
-  },
-  "UDTF_RETURN_TYPE_MISMATCH": {
-    "message": [
-      "Mismatch in return type for the UDTF '<name>'. Expected a 'StructType', but got '<return_type>'. Please ensure the return type is a correctly formatted StructType."
-    ]
-  },
-  "UDTF_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the UDTF '<name>': <message>"
-    ]
-  },
-  "UNEXPECTED_RESPONSE_FROM_SERVER": {
-    "message": [
-      "Unexpected response from iterator server."
-    ]
-  },
-  "UNEXPECTED_TUPLE_WITH_STRUCT": {
-    "message": [
-      "Unexpected tuple <tuple> with StructType."
-    ]
-  },
-  "UNKNOWN_EXPLAIN_MODE": {
-    "message": [
-      "Unknown explain mode: '<explain_mode>'. Accepted explain modes are 'simple', 'extended', 'codegen', 'cost', 'formatted'."
-    ]
-  },
-  "UNKNOWN_INTERRUPT_TYPE": {
-    "message": [
-      "Unknown interrupt type: '<interrupt_type>'. Accepted interrupt types are 'all'."
-    ]
-  },
-  "UNKNOWN_RESPONSE": {
-    "message": [
-      "Unknown response: <response>."
-    ]
-  },
-  "UNKNOWN_VALUE_FOR": {
-    "message": [
-      "Unknown value for `<var>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE": {
-    "message": [
-      "Unsupported DataType `<data_type>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW": {
-    "message": [
-      "Single data type <data_type> is not supported with Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION": {
-    "message": [
-      "<data_type> is not supported in conversion to Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_VERSION": {
-    "message": [
-      "<data_type> is only supported with pyarrow 2.0.0 and above."
-    ]
-  },
-  "UNSUPPORTED_JOIN_TYPE": {
-    "message": [
-      "Unsupported join type: <join_type>. Supported join types include: 'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'semi', 'leftanti', 'left_anti', 'anti', 'cross'."
-    ]
-  },
-  "UNSUPPORTED_LITERAL": {
-    "message": [
-      "Unsupported Literal '<literal>'."
-    ]
-  },
-  "UNSUPPORTED_LOCAL_CONNECTION_STRING": {
-    "message": [
-      "Creating new SparkSessions with `local` connection string is not supported."
-    ]
-  },
-  "UNSUPPORTED_NUMPY_ARRAY_SCALAR": {
-    "message": [
-      "The type of array scalar '<dtype>' is not supported."
-    ]
-  },
-  "UNSUPPORTED_OPERATION": {
-    "message": [
-      "<operation> is not supported."
-    ]
-  },
-  "UNSUPPORTED_PACKAGE_VERSION": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, your version is <current_version>."
-    ]
-  },
-  "UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should use only POSITIONAL or POSITIONAL OR KEYWORD arguments."
-    ]
-  },
-  "UNSUPPORTED_SIGNATURE": {
-    "message": [
-      "Unsupported signature: <signature>."
-    ]
-  },
-  "UNSUPPORTED_WITH_ARROW_OPTIMIZATION": {
-    "message": [
-      "<feature> is not supported with Arrow optimization enabled in Python UDFs. Disable 'spark.sql.execution.pythonUDF.arrow.enabled' to workaround."
-    ]
-  },
-  "VALUE_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` does not allow <disallowed_value>."
-    ]
-  },
-  "VALUE_NOT_ACCESSIBLE": {
-    "message": [
-      "Value `<value>` cannot be accessed inside tasks."
-    ]
-  },
-  "VALUE_NOT_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` has to be amongst the following values: <allowed_values>."
-    ]
-  },
-  "VALUE_NOT_ANY_OR_ALL": {
-    "message": [
-      "Value for `<arg_name>` must be 'any' or 'all', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_BETWEEN": {
-    "message": [
-      "Value for `<arg_name>` must be between <min> and <max>."
-    ]
-  },
-  "VALUE_NOT_NON_EMPTY_STR": {
-    "message": [
-      "Value for `<arg_name>` must be a non-empty string, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PEARSON": {
-    "message": [
-      "Value for `<arg_name>` only supports the 'pearson', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PLAIN_COLUMN_REFERENCE": {
-    "message": [
-      "Value `<val>` in `<field_name>` should be a plain column reference such as `df.col` or `col('column')`."
-    ]
-  },
-  "VALUE_NOT_POSITIVE": {
-    "message": [
-      "Value for `<arg_name>` must be positive, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_TRUE": {
-    "message": [
-      "Value for `<arg_name>` must be True, got '<arg_value>'."
-    ]
-  },
-  "VALUE_OUT_OF_BOUNDS": {
-    "message": [
-      "Value for `<arg_name>` must be between <lower_bound> and <upper_bound> (inclusive), got <actual>"
-    ]
-  },
-  "WRONG_NUM_ARGS_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should take between 1 and 3 arguments, but the provided function takes <num_args>."
-    ]
-  },
-  "WRONG_NUM_COLUMNS": {
-    "message": [
-      "Function `<func_name>` should take at least <num_cols> columns."
-    ]
-  },
-  "ZERO_INDEX": {
-    "message": [
-      "Index must be non-zero."
-    ]
-  }
-}
-'''
-
+# Note: Though we call them "error classes" here, the proper name is "error conditions",
+#   hence why the name of the JSON file different.
+#   For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+#   This discrepancy will be resolved as part of: https://issues.apache.org/jira/browse/SPARK-47429
+# Note: When we drop support for Python 3.8, we should migrate from importlib.resources.read_text()

Review Comment:
   We dropped Python 3.8 lately :-) at https://github.com/apache/spark/pull/46228



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1913888854

   IIRC there was no major issue with managing the `json` itself. However, since we cannot integrate with the [error-classes.json](https://github.com/databricks/runtime/blob/master/common/utils/src/main/resources/error/error-classes.json) file on the JVM side - because we didn't want to have a JVM dependency -, we simply adopted a `.py` file that is more convenient way to manage in Python.
   
   So I agree to change to a `json` file if the advantage of using a json file over using a `.py` file is clear, and if there are no issues with packaging. Also you might need to take a deeper look at the documentation. For example we're using `.py` file to build documentation for [Error classes in PySpark](https://spark.apache.org/docs/latest/api/python/development/errors.html#error-classes-in-pyspark).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2026650601

   People can actually directly use PySpark via importing `pyspark.zip`, see https://spark.apache.org/docs/latest/api/python/getting_started/install.html?highlight=pythonpath#manually-downloading


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1586988988


##########
python/MANIFEST.in:
##########
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   I agree that it's safer so we don't miss something out ... but let's just add `json` file alone .. I think it's more import to get rid of unrelated files ..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1468973275


##########
python/pyspark/errors/error-conditions.json:
##########
@@ -0,0 +1,1096 @@
+{

Review Comment:
   Yes, good call out. I was looking at `MANIFEST.in` and I believe this file should be included, but I will confirm.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584127568


##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1110 +15,14 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-    "message": [
-      "An application name must be set in your configuration."
-    ]
-  },
-  "ARGUMENT_REQUIRED": {
-    "message": [
-      "Argument `<arg_name>` is required when <condition>."
-    ]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-    "message": [
-      "Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT."
-    ]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-    "message": [
-      "Attribute `<attr_name>` in provided object `<obj_name>` is not callable."
-    ]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported."
-    ]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-    "message": [
-      "Length mismatch: Expected axis has <expected_length> element, new values have <actual_length> elements."
-    ]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-    "message": [
-      "Broadcast variable `<variable>` not loaded."
-    ]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-    "message": [
-      "Not supported to call `<func_name>` before initialize <object>."
-    ]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-    "message": [
-      "`<data_type>` can not accept object `<obj_name>` in type `<obj_type>`."
-    ]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-    "message": [
-      "Dunder(double underscore) attribute is for internal use only."
-    ]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-    "message": [
-      "Cannot apply 'in' operator against a column: please use 'contains' in a string column or 'array_contains' function for an array column."
-    ]
-  },
-  "CANNOT_BE_EMPTY": {
-    "message": [
-      "At least one <item> must be specified."
-    ]
-  },
-  "CANNOT_BE_NONE": {
-    "message": [
-      "Argument `<arg_name>` cannot be None."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-    "message": [
-      "Spark Connect server cannot be configured: Existing [<existing_url>], New [<new_url>]."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-    "message": [
-      "Spark Connect server and Spark master cannot be configured together: Spark master [<master_url>], Spark Connect [<connect_url>]."
-    ]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-    "message": [
-      "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions."
-    ]
-  },
-  "CANNOT_CONVERT_TYPE": {
-    "message": [
-      "Cannot convert <from_type> into <to_type>."
-    ]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-    "message": [
-      "Some of types cannot be determined after inferring."
-    ]
-  },
-  "CANNOT_GET_BATCH_ID": {
-    "message": [
-      "Could not get batch id from <obj_name>."
-    ]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-    "message": [
-      "Can not infer Array Type from a list with None as the first element."
-    ]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-    "message": [
-      "Can not infer schema from an empty dataset."
-    ]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-    "message": [
-      "Can not infer schema for type: `<data_type>`."
-    ]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-    "message": [
-      "Unable to infer the type of the field `<field_name>`."
-    ]
-  },
-  "CANNOT_MERGE_TYPE": {
-    "message": [
-      "Can not merge type `<data_type1>` and `<data_type2>`."
-    ]
-  },
-  "CANNOT_OPEN_SOCKET": {
-    "message": [
-      "Can not open socket: <errors>."
-    ]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-    "message": [
-      "Unable to parse datatype. <msg>."
-    ]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-    "message": [
-      "Metadata can only be provided for a single column."
-    ]
-  },
-  "CANNOT_SET_TOGETHER": {
-    "message": [
-      "<arg_list> should not be set together."
-    ]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-    "message": [
-      "returnType can not be specified when `<arg_name>` is a user-defined function, but got <return_type>."
-    ]
-  },
-  "CANNOT_WITHOUT": {
-    "message": [
-      "Cannot <condition1> without <condition2>."
-    ]
-  },
-  "COLUMN_IN_LIST": {
-    "message": [
-      "`<func_name>` does not allow a Column in a list."
-    ]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-    "message": [
-      "Only one Spark Connect client URL can be set; however, got a different URL [<new_url>] from the existing [<existing_url>]."
-    ]
-  },
-  "CONNECT_URL_NOT_SET": {
-    "message": [
-      "Cannot create a Spark Connect session because the Spark Connect remote URL has not been set. Please define the remote URL by setting either the 'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-    ]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."
-    ]
-  },
-  "CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT": {
-    "message": [
-      "Remote client cannot create a SparkContext. Create SparkSession instead."
-    ]
-  },
-  "DATA_SOURCE_INVALID_RETURN_TYPE": {
-    "message": [
-      "Unsupported return type ('<type>') from Python data source '<name>'. Expected types: <supported_types>."
-    ]
-  },
-  "DATA_SOURCE_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "Return schema mismatch in the result from 'read' method. Expected: <expected> columns, Found: <actual> columns. Make sure the returned values match the required output schema."
-    ]
-  },
-  "DATA_SOURCE_TYPE_MISMATCH": {
-    "message": [
-      "Expected <expected>, but got <actual>."
-    ]
-  },
-  "DIFFERENT_PANDAS_DATAFRAME": {
-    "message": [
-      "DataFrames are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_INDEX": {
-    "message": [
-      "Indices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_MULTIINDEX": {
-    "message": [
-      "MultiIndices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_SERIES": {
-    "message": [
-      "Series are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_ROWS": {
-    "message": [
-      "<error_msg>"
-    ]
-  },
-  "DIFFERENT_SCHEMA": {
-    "message": [
-      "Schemas do not match.",
-      "--- actual",
-      "+++ expected",
-      "<error_msg>"
-    ]
-  },
-  "DISALLOWED_TYPE_FOR_CONTAINER": {
-    "message": [
-      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <item_type>"
-    ]
-  },
-  "DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT": {
-    "message": [
-      "Duplicated field names in Arrow Struct are not allowed, got <field_names>"
-    ]
-  },
-  "ERROR_OCCURRED_WHILE_CALLING": {
-    "message": [
-      "An error occurred while calling <func_name>: <error_msg>."
-    ]
-  },
-  "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN": {
-    "message": [
-      "Function `<func_name>` should return Column, got <return_type>."
-    ]
-  },
-  "INCORRECT_CONF_FOR_PROFILE": {
-    "message": [
-      "`spark.python.profile` or `spark.python.profile.memory` configuration",
-      " must be set to `true` to enable Python profile."
-    ]
-  },
-  "INDEX_NOT_POSITIVE": {
-    "message": [
-      "Index must be positive, got '<index>'."
-    ]
-  },
-  "INDEX_OUT_OF_RANGE": {
-    "message": [
-      "<arg_name> index out of range, got '<index>'."
-    ]
-  },
-  "INVALID_ARROW_UDTF_RETURN_TYPE": {
-    "message": [
-      "The return type of the arrow-optimized Python UDTF should be of type 'pandas.DataFrame', but the '<func>' method returned a value of type <return_type> with value: <value>."
-    ]
-  },
-  "INVALID_BROADCAST_OPERATION": {
-    "message": [
-      "Broadcast can only be <operation> in driver."
-    ]
-  },
-  "INVALID_CALL_ON_UNRESOLVED_OBJECT": {
-    "message": [
-      "Invalid call to `<func_name>` on unresolved object."
-    ]
-  },
-  "INVALID_CONNECT_URL": {
-    "message": [
-      "Invalid URL for Spark Connect: <detail>"
-    ]
-  },
-  "INVALID_INTERVAL_CASTING": {
-    "message": [
-      "Interval <start_field> to <end_field> is invalid."
-    ]
-  },
-  "INVALID_ITEM_FOR_CONTAINER": {
-    "message": [
-      "All items in `<arg_name>` should be in <allowed_types>, got <item_type>."
-    ]
-  },
-  "INVALID_MULTIPLE_ARGUMENT_CONDITIONS": {
-    "message": [
-      "[{arg_names}] cannot be <condition>."
-    ]
-  },
-  "INVALID_NDARRAY_DIMENSION": {
-    "message": [
-      "NumPy array input should be of <dimensions> dimensions."
-    ]
-  },
-  "INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP": {
-    "message": [
-      "Invalid number of dataframes in group <dataframes_in_group>."
-    ]
-  },
-  "INVALID_PANDAS_UDF": {
-    "message": [
-      "Invalid function: <detail>"
-    ]
-  },
-  "INVALID_PANDAS_UDF_TYPE": {
-    "message": [
-      "`<arg_name>` should be one of the values from PandasUDFType, got <arg_type>"
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_ARROW_UDF": {
-    "message": [
-      "Grouped and Cogrouped map Arrow UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_PANDAS_UDF": {
-    "message": [
-      "Pandas UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_SESSION_UUID_ID": {
-    "message": [
-      "Parameter value <arg_name> must be a valid UUID format: <origin>"
-    ]
-  },
-  "INVALID_TIMEOUT_TIMESTAMP": {
-    "message": [
-      "Timeout timestamp (<timestamp>) cannot be earlier than the current watermark (<watermark>)."
-    ]
-  },
-  "INVALID_TYPE": {
-    "message": [
-      "Argument `<arg_name>` should not be a <arg_type>."
-    ]
-  },
-  "INVALID_TYPENAME_CALL": {
-    "message": [
-      "StructField does not have typeName. Use typeName on its type explicitly instead."
-    ]
-  },
-  "INVALID_TYPE_DF_EQUALITY_ARG": {
-    "message": [
-      "Expected type <expected_type> for `<arg_name>` but got type <actual_type>."
-    ]
-  },
-  "INVALID_UDF_EVAL_TYPE": {
-    "message": [
-      "Eval type for UDF must be <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_BOTH_RETURN_TYPE_AND_ANALYZE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It has both its return type and an 'analyze' attribute. Please make it have one of either the return type or the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_EVAL_TYPE": {
-    "message": [
-      "The eval type for the UDTF '<name>' is invalid. It must be one of <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_HANDLER_TYPE": {
-    "message": [
-      "The UDTF is invalid. The function handler must be a class, but got '<type>'. Please provide a class as the function handler."
-    ]
-  },
-  "INVALID_UDTF_NO_EVAL": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not implement the required 'eval' method. Please implement the 'eval' method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_RETURN_TYPE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not specify its return type or implement the required 'analyze' static method. Please specify the return type or implement the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_WHEN_USAGE": {
-    "message": [
-      "when() can only be applied on a Column previously generated by when() function, and cannot be applied once otherwise() is applied."
-    ]
-  },
-  "INVALID_WINDOW_BOUND_TYPE": {
-    "message": [
-      "Invalid window bound type: <window_bound_type>."
-    ]
-  },
-  "JAVA_GATEWAY_EXITED": {
-    "message": [
-      "Java gateway process exited before sending its port number."
-    ]
-  },
-  "JVM_ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail."
-    ]
-  },
-  "KEY_NOT_EXISTS": {
-    "message": [
-      "Key `<key>` is not exists."
-    ]
-  },
-  "KEY_VALUE_PAIR_REQUIRED": {
-    "message": [
-      "Key-value pair or a list of pairs is required."
-    ]
-  },
-  "LENGTH_SHOULD_BE_THE_SAME": {
-    "message": [
-      "<arg1> and <arg2> should be of the same length, got <arg1_length> and <arg2_length>."
-    ]
-  },
-  "MASTER_URL_NOT_SET": {
-    "message": [
-      "A master URL must be set in your configuration."
-    ]
-  },
-  "MISSING_LIBRARY_FOR_PROFILER": {
-    "message": [
-      "Install the 'memory_profiler' library in the cluster to enable memory profiling."
-    ]
-  },
-  "MISSING_VALID_PLAN": {
-    "message": [
-      "Argument to <operator> does not contain a valid plan."
-    ]
-  },
-  "MIXED_TYPE_REPLACEMENT": {
-    "message": [
-      "Mixed type replacements are not supported."
-    ]
-  },
-  "NEGATIVE_VALUE": {
-    "message": [
-      "Value for `<arg_name>` must be greater than or equal to 0, got '<arg_value>'."
-    ]
-  },
-  "NOT_BOOL": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float or int, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_LIST_OR_NONE_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int, list, None, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or list, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or str, got <arg_type>."
-    ]
-  },
-  "NOT_CALLABLE": {
-    "message": [
-      "Argument `<arg_name>` should be a callable, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, str or DataType, but got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, float, integer, list or string, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or int, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int, list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a StructType, Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_DATAFRAME": {
-    "message": [
-      "Argument `<arg_name>` should be a DataFrame, got <arg_type>."
-    ]
-  },
-  "NOT_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a DataType or str, got <arg_type>."
-    ]
-  },
-  "NOT_DICT": {
-    "message": [
-      "Argument `<arg_name>` should be a dict, got <arg_type>."
-    ]
-  },
-  "NOT_EXPRESSION": {
-    "message": [
-      "Argument `<arg_name>` should be an Expression, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a float or int, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a float, int, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_IMPLEMENTED": {
-    "message": [
-      "<feature> is not implemented."
-    ]
-  },
-  "NOT_INSTANCE_OF": {
-    "message": [
-      "<value> is not an instance of type <type>."
-    ]
-  },
-  "NOT_INT": {
-    "message": [
-      "Argument `<arg_name>` should be an int, got <arg_type>."
-    ]
-  },
-  "NOT_INT_OR_SLICE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an int, slice or str, got <arg_type>."
-    ]
-  },
-  "NOT_IN_BARRIER_STAGE": {
-    "message": [
-      "It is not in a barrier stage."
-    ]
-  },
-  "NOT_ITERABLE": {
-    "message": [
-      "<objectName> is not iterable."
-    ]
-  },
-  "NOT_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a list, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a list[float, int], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[str], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_NONE_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a list, None or StructType, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_NUMERIC_COLUMNS": {
-    "message": [
-      "Numeric aggregation function can only be applied on numeric columns, got <invalid_columns>."
-    ]
-  },
-  "NOT_OBSERVATION_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an Observation or str, got <arg_type>."
-    ]
-  },
-  "NOT_SAME_TYPE": {
-    "message": [
-      "Argument `<arg_name1>` and `<arg_name2>` should be the same type, got <arg_type1> and <arg_type2>."
-    ]
-  },
-  "NOT_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a str, got <arg_type>."
-    ]
-  },
-  "NOT_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a struct type, got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_LIST_OF_RDD": {
-    "message": [
-      "Argument `<arg_name>` should be a str or list[RDD], got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a str or struct type, got <arg_type>."
-    ]
-  },
-  "NOT_WINDOWSPEC": {
-    "message": [
-      "Argument `<arg_name>` should be a WindowSpec, got <arg_type>."
-    ]
-  },
-  "NO_ACTIVE_EXCEPTION": {
-    "message": [
-      "No active exception."
-    ]
-  },
-  "NO_ACTIVE_OR_DEFAULT_SESSION": {
-    "message": [
-      "No active or default Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_ACTIVE_SESSION": {
-    "message": [
-      "No active Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_OBSERVE_BEFORE_GET": {
-    "message": [
-      "Should observe by calling `DataFrame.observe` before `get`."
-    ]
-  },
-  "NO_SCHEMA_AND_DRIVER_DEFAULT_SCHEME": {
-    "message": [
-      "Only allows <arg_name> to be a path without scheme, and Spark Driver should use the default scheme to determine the destination file system."
-    ]
-  },
-  "ONLY_ALLOWED_FOR_SINGLE_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` can only be provided for a single column."
-    ]
-  },
-  "ONLY_ALLOW_SINGLE_TRIGGER": {
-    "message": [
-      "Only a single trigger is allowed."
-    ]
-  },
-  "ONLY_SUPPORTED_WITH_SPARK_CONNECT": {
-    "message": [
-      "<feature> is only supported with Spark Connect; however, the current Spark session does not use Spark Connect."
-    ]
-  },
-  "PACKAGE_NOT_INSTALLED": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, it was not found."
-    ]
-  },
-  "PIPE_FUNCTION_EXITED": {
-    "message": [
-      "Pipe function `<func_name>` exited with error code <error_code>."
-    ]
-  },
-  "PYTHON_HASH_SEED_NOT_SET": {
-    "message": [
-      "Randomness of hash of string should be disabled via PYTHONHASHSEED."
-    ]
-  },
-  "PYTHON_VERSION_MISMATCH": {
-    "message": [
-      "Python in worker has different version: <worker_version> than that in driver: <driver_version>, PySpark cannot run with different minor versions.",
-      "Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set."
-    ]
-  },
-  "RDD_TRANSFORM_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to broadcast an RDD or reference an RDD from an ",
-      "action or transformation. RDD transformations and actions can only be invoked by the ",
-      "driver, not inside of other transformations; for example, ",
-      "rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values ",
-      "transformation and count action cannot be performed inside of the rdd1.map ",
-      "transformation. For more information, see SPARK-5063."
-    ]
-  },
-  "READ_ONLY": {
-    "message": [
-      "<object> is read-only."
-    ]
-  },
-  "RESPONSE_ALREADY_RECEIVED": {
-    "message": [
-      "OPERATION_NOT_FOUND on the server but responses were already received from it."
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Column names of the returned pyarrow.Table do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Column names of the returned pandas.DataFrame do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: <expected> Actual: <actual>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was <output_length> and the length of input was <input_length>."
-    ]
-  },
-  "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Columns do not match in their data type: <mismatch>."
-    ]
-  },
-  "RETRIES_EXCEEDED": {
-    "message": [
-      "The maximum number of retries has been exceeded."
-    ]
-  },
-  "REUSE_OBSERVATION": {
-    "message": [
-      "An Observation can be used with a DataFrame only once."
-    ]
-  },
-  "SCHEMA_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Result vector from pandas_udf was not the required length: expected <expected>, got <actual>."
-    ]
-  },
-  "SESSION_ALREADY_EXIST": {
-    "message": [
-      "Cannot start a remote Spark session because there is a regular Spark session already running."
-    ]
-  },
-  "SESSION_NEED_CONN_STR_OR_BUILDER": {
-    "message": [
-      "Needs either connection string or channelBuilder (mutually exclusive) to create a new SparkSession."
-    ]
-  },
-  "SESSION_NOT_SAME": {
-    "message": [
-      "Both Datasets must belong to the same SparkSession."
-    ]
-  },
-  "SESSION_OR_CONTEXT_EXISTS": {
-    "message": [
-      "There should not be an existing Spark Session or Spark Context."
-    ]
-  },
-  "SESSION_OR_CONTEXT_NOT_EXISTS": {
-    "message": [
-      "SparkContext or SparkSession should be created first."
-    ]
-  },
-  "SLICE_WITH_STEP": {
-    "message": [
-      "Slice with step is not supported."
-    ]
-  },
-  "STATE_NOT_EXISTS": {
-    "message": [
-      "State is either not defined or has already been removed."
-    ]
-  },
-  "STOP_ITERATION_OCCURRED": {
-    "message": [
-      "Caught StopIteration thrown from user's code; failing the task: <exc>"
-    ]
-  },
-  "STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "pandas iterator UDF should exhaust the input iterator."
-    ]
-  },
-  "STREAMING_CONNECT_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the function `<name>`. If you accessed the Spark session, or a DataFrame defined outside of the function, or any object that contains a Spark session, please be aware that they are not allowed in Spark Connect. For `foreachBatch`, please access the Spark session using `df.sparkSession`, where `df` is the first parameter in your `foreachBatch` function. For `StreamingQueryListener`, please access the Spark session using `self.spark`. For details please check out the PySpark doc for `foreachBatch` and `StreamingQueryListener`."
-    ]
-  },
-  "TEST_CLASS_NOT_COMPILED": {
-    "message": [
-      "<test_class_path> doesn't exist. Spark sql test classes are not compiled."
-    ]
-  },
-  "TOO_MANY_VALUES": {
-    "message": [
-      "Expected <expected> values for `<item>`, got <actual>."
-    ]
-  },
-  "TYPE_HINT_SHOULD_BE_SPECIFIED": {
-    "message": [
-      "Type hints for <target> should be specified; however, got <sig>."
-    ]
-  },
-  "UDF_RETURN_TYPE": {
-    "message": [
-      "Return type of the user-defined function should be <expected>, but is <actual>."
-    ]
-  },
-  "UDTF_ARROW_TYPE_CAST_ERROR": {
-    "message": [
-      "Cannot convert the output value of the column '<col_name>' with type '<col_type>' to the specified return type of the column: '<arrow_type>'. Please check if the data types match and try again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_IMPLEMENTS_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function implements the 'analyze' method, but its constructor has more than two arguments (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, or one 'self' argument plus another argument for the result of the 'analyze' method, and try the query again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function does not implement the 'analyze' method, and its constructor has more than one argument (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, and try the query again."
-    ]
-  },
-  "UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because the function arguments did not match the expected signature of the 'eval' method (<reason>). Please update the query so that this table function call provides arguments matching the expected signature, or else update the table function so that its 'eval' method accepts the provided arguments, and then try the query again."
-    ]
-  },
-  "UDTF_EXEC_ERROR": {
-    "message": [
-      "User defined table function encountered an error in the '<method_name>' method: <error>"
-    ]
-  },
-  "UDTF_INVALID_OUTPUT_ROW_TYPE": {
-    "message": [
-      "The type of an individual output row in the '<func>' method of the UDTF is invalid. Each row should be a tuple, list, or dict, but got '<type>'. Please make sure that the output rows are of the correct type."
-    ]
-  },
-  "UDTF_RETURN_NOT_ITERABLE": {
-    "message": [
-      "The return value of the '<func>' method of the UDTF is invalid. It should be an iterable (e.g., generator or list), but got '<type>'. Please make sure that the UDTF returns one of these types."
-    ]
-  },
-  "UDTF_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "The number of columns in the result does not match the specified schema. Expected column count: <expected>, Actual column count: <actual>. Please make sure the values returned by the '<func>' method have the same number of columns as specified in the output schema."
-    ]
-  },
-  "UDTF_RETURN_TYPE_MISMATCH": {
-    "message": [
-      "Mismatch in return type for the UDTF '<name>'. Expected a 'StructType', but got '<return_type>'. Please ensure the return type is a correctly formatted StructType."
-    ]
-  },
-  "UDTF_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the UDTF '<name>': <message>"
-    ]
-  },
-  "UNEXPECTED_RESPONSE_FROM_SERVER": {
-    "message": [
-      "Unexpected response from iterator server."
-    ]
-  },
-  "UNEXPECTED_TUPLE_WITH_STRUCT": {
-    "message": [
-      "Unexpected tuple <tuple> with StructType."
-    ]
-  },
-  "UNKNOWN_EXPLAIN_MODE": {
-    "message": [
-      "Unknown explain mode: '<explain_mode>'. Accepted explain modes are 'simple', 'extended', 'codegen', 'cost', 'formatted'."
-    ]
-  },
-  "UNKNOWN_INTERRUPT_TYPE": {
-    "message": [
-      "Unknown interrupt type: '<interrupt_type>'. Accepted interrupt types are 'all'."
-    ]
-  },
-  "UNKNOWN_RESPONSE": {
-    "message": [
-      "Unknown response: <response>."
-    ]
-  },
-  "UNKNOWN_VALUE_FOR": {
-    "message": [
-      "Unknown value for `<var>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE": {
-    "message": [
-      "Unsupported DataType `<data_type>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW": {
-    "message": [
-      "Single data type <data_type> is not supported with Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION": {
-    "message": [
-      "<data_type> is not supported in conversion to Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_VERSION": {
-    "message": [
-      "<data_type> is only supported with pyarrow 2.0.0 and above."
-    ]
-  },
-  "UNSUPPORTED_JOIN_TYPE": {
-    "message": [
-      "Unsupported join type: <join_type>. Supported join types include: 'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'semi', 'leftanti', 'left_anti', 'anti', 'cross'."
-    ]
-  },
-  "UNSUPPORTED_LITERAL": {
-    "message": [
-      "Unsupported Literal '<literal>'."
-    ]
-  },
-  "UNSUPPORTED_LOCAL_CONNECTION_STRING": {
-    "message": [
-      "Creating new SparkSessions with `local` connection string is not supported."
-    ]
-  },
-  "UNSUPPORTED_NUMPY_ARRAY_SCALAR": {
-    "message": [
-      "The type of array scalar '<dtype>' is not supported."
-    ]
-  },
-  "UNSUPPORTED_OPERATION": {
-    "message": [
-      "<operation> is not supported."
-    ]
-  },
-  "UNSUPPORTED_PACKAGE_VERSION": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, your version is <current_version>."
-    ]
-  },
-  "UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should use only POSITIONAL or POSITIONAL OR KEYWORD arguments."
-    ]
-  },
-  "UNSUPPORTED_SIGNATURE": {
-    "message": [
-      "Unsupported signature: <signature>."
-    ]
-  },
-  "UNSUPPORTED_WITH_ARROW_OPTIMIZATION": {
-    "message": [
-      "<feature> is not supported with Arrow optimization enabled in Python UDFs. Disable 'spark.sql.execution.pythonUDF.arrow.enabled' to workaround."
-    ]
-  },
-  "VALUE_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` does not allow <disallowed_value>."
-    ]
-  },
-  "VALUE_NOT_ACCESSIBLE": {
-    "message": [
-      "Value `<value>` cannot be accessed inside tasks."
-    ]
-  },
-  "VALUE_NOT_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` has to be amongst the following values: <allowed_values>."
-    ]
-  },
-  "VALUE_NOT_ANY_OR_ALL": {
-    "message": [
-      "Value for `<arg_name>` must be 'any' or 'all', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_BETWEEN": {
-    "message": [
-      "Value for `<arg_name>` must be between <min> and <max>."
-    ]
-  },
-  "VALUE_NOT_NON_EMPTY_STR": {
-    "message": [
-      "Value for `<arg_name>` must be a non-empty string, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PEARSON": {
-    "message": [
-      "Value for `<arg_name>` only supports the 'pearson', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PLAIN_COLUMN_REFERENCE": {
-    "message": [
-      "Value `<val>` in `<field_name>` should be a plain column reference such as `df.col` or `col('column')`."
-    ]
-  },
-  "VALUE_NOT_POSITIVE": {
-    "message": [
-      "Value for `<arg_name>` must be positive, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_TRUE": {
-    "message": [
-      "Value for `<arg_name>` must be True, got '<arg_value>'."
-    ]
-  },
-  "VALUE_OUT_OF_BOUND": {
-    "message": [
-      "Value for `<arg_name>` must be greater than <lower_bound> or less than <upper_bound>, got <actual>"
-    ]
-  },
-  "WRONG_NUM_ARGS_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should take between 1 and 3 arguments, but the provided function takes <num_args>."
-    ]
-  },
-  "WRONG_NUM_COLUMNS": {
-    "message": [
-      "Function `<func_name>` should take at least <num_cols> columns."
-    ]
-  }
-}
-'''
-
+# Note: Though we call them "error classes" here, the proper name is "error conditions",
+#   hence why the name of the JSON file different.
+#   For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+# Note: When we drop support for Python 3.8, we should migrate from importlib.resources.read_text()
+#   to importlib.resources.files().joinpath().read_text().
+#   See: https://docs.python.org/3/library/importlib.resources.html#importlib.resources.open_text
+ERROR_CLASSES_JSON = importlib.resources.read_text("pyspark.errors", "error-conditions.json")

Review Comment:
   Oh okay here



##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
     import json
+    from pathlib import Path
     from pyspark.errors import error_classes
 
-    with open("python/pyspark/errors/error_classes.py", "w") as f:
-        error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+    ERRORS_DIR = Path(__file__).parents[1]
 
-ERROR_CLASSES_MAP = json.loads(ERROR_CLASSES_JSON)
-""" % json.dumps(
-            error_classes.ERROR_CLASSES_MAP, sort_keys=True, indent=2
+    with open(ERRORS_DIR / "error-conditions.json", "w") as f:

Review Comment:
   Oh oops



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584127828


##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1160 +15,15 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-    "message": [
-      "An application name must be set in your configuration."
-    ]
-  },
-  "ARGUMENT_REQUIRED": {
-    "message": [
-      "Argument `<arg_name>` is required when <condition>."
-    ]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-    "message": [
-      "Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT."
-    ]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-    "message": [
-      "Attribute `<attr_name>` in provided object `<obj_name>` is not callable."
-    ]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported."
-    ]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-    "message": [
-      "Length mismatch: Expected axis has <expected_length> element, new values have <actual_length> elements."
-    ]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-    "message": [
-      "Broadcast variable `<variable>` not loaded."
-    ]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-    "message": [
-      "Not supported to call `<func_name>` before initialize <object>."
-    ]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-    "message": [
-      "`<data_type>` can not accept object `<obj_name>` in type `<obj_type>`."
-    ]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-    "message": [
-      "Dunder(double underscore) attribute is for internal use only."
-    ]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-    "message": [
-      "Cannot apply 'in' operator against a column: please use 'contains' in a string column or 'array_contains' function for an array column."
-    ]
-  },
-  "CANNOT_BE_EMPTY": {
-    "message": [
-      "At least one <item> must be specified."
-    ]
-  },
-  "CANNOT_BE_NONE": {
-    "message": [
-      "Argument `<arg_name>` cannot be None."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-    "message": [
-      "Spark Connect server cannot be configured: Existing [<existing_url>], New [<new_url>]."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-    "message": [
-      "Spark Connect server and Spark master cannot be configured together: Spark master [<master_url>], Spark Connect [<connect_url>]."
-    ]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-    "message": [
-      "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions."
-    ]
-  },
-  "CANNOT_CONVERT_TYPE": {
-    "message": [
-      "Cannot convert <from_type> into <to_type>."
-    ]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-    "message": [
-      "Some of types cannot be determined after inferring."
-    ]
-  },
-  "CANNOT_GET_BATCH_ID": {
-    "message": [
-      "Could not get batch id from <obj_name>."
-    ]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-    "message": [
-      "Can not infer Array Type from a list with None as the first element."
-    ]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-    "message": [
-      "Can not infer schema from an empty dataset."
-    ]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-    "message": [
-      "Can not infer schema for type: `<data_type>`."
-    ]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-    "message": [
-      "Unable to infer the type of the field `<field_name>`."
-    ]
-  },
-  "CANNOT_MERGE_TYPE": {
-    "message": [
-      "Can not merge type `<data_type1>` and `<data_type2>`."
-    ]
-  },
-  "CANNOT_OPEN_SOCKET": {
-    "message": [
-      "Can not open socket: <errors>."
-    ]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-    "message": [
-      "Unable to parse datatype. <msg>."
-    ]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-    "message": [
-      "Metadata can only be provided for a single column."
-    ]
-  },
-  "CANNOT_SET_TOGETHER": {
-    "message": [
-      "<arg_list> should not be set together."
-    ]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-    "message": [
-      "returnType can not be specified when `<arg_name>` is a user-defined function, but got <return_type>."
-    ]
-  },
-  "CANNOT_WITHOUT": {
-    "message": [
-      "Cannot <condition1> without <condition2>."
-    ]
-  },
-  "COLUMN_IN_LIST": {
-    "message": [
-      "`<func_name>` does not allow a Column in a list."
-    ]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-    "message": [
-      "Only one Spark Connect client URL can be set; however, got a different URL [<new_url>] from the existing [<existing_url>]."
-    ]
-  },
-  "CONNECT_URL_NOT_SET": {
-    "message": [
-      "Cannot create a Spark Connect session because the Spark Connect remote URL has not been set. Please define the remote URL by setting either the 'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-    ]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."
-    ]
-  },
-  "CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT": {
-    "message": [
-      "Remote client cannot create a SparkContext. Create SparkSession instead."
-    ]
-  },
-  "DATA_SOURCE_CREATE_ERROR": {
-    "message": [
-      "Failed to create python data source instance, error: <error>."
-    ]
-  },
-  "DATA_SOURCE_INVALID_RETURN_TYPE": {
-    "message": [
-      "Unsupported return type ('<type>') from Python data source '<name>'. Expected types: <supported_types>."
-    ]
-  },
-  "DATA_SOURCE_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "Return schema mismatch in the result from 'read' method. Expected: <expected> columns, Found: <actual> columns. Make sure the returned values match the required output schema."
-    ]
-  },
-  "DATA_SOURCE_TYPE_MISMATCH": {
-    "message": [
-      "Expected <expected>, but got <actual>."
-    ]
-  },
-  "DIFFERENT_PANDAS_DATAFRAME": {
-    "message": [
-      "DataFrames are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_INDEX": {
-    "message": [
-      "Indices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_MULTIINDEX": {
-    "message": [
-      "MultiIndices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_SERIES": {
-    "message": [
-      "Series are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_ROWS": {
-    "message": [
-      "<error_msg>"
-    ]
-  },
-  "DIFFERENT_SCHEMA": {
-    "message": [
-      "Schemas do not match.",
-      "--- actual",
-      "+++ expected",
-      "<error_msg>"
-    ]
-  },
-  "DISALLOWED_TYPE_FOR_CONTAINER": {
-    "message": [
-      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <item_type>"
-    ]
-  },
-  "DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT": {
-    "message": [
-      "Duplicated field names in Arrow Struct are not allowed, got <field_names>"
-    ]
-  },
-  "ERROR_OCCURRED_WHILE_CALLING": {
-    "message": [
-      "An error occurred while calling <func_name>: <error_msg>."
-    ]
-  },
-  "FIELD_DATA_TYPE_UNACCEPTABLE": {
-    "message": [
-      "<data_type> can not accept object <obj> in type <obj_type>."
-    ]
-  },
-  "FIELD_DATA_TYPE_UNACCEPTABLE_WITH_NAME": {
-    "message": [
-      "<field_name>: <data_type> can not accept object <obj> in type <obj_type>."
-    ]
-  },
-  "FIELD_NOT_NULLABLE": {
-    "message": [
-      "Field is not nullable, but got None."
-    ]
-  },
-  "FIELD_NOT_NULLABLE_WITH_NAME": {
-    "message": [
-      "<field_name>: This field is not nullable, but got None."
-    ]
-  },
-  "FIELD_STRUCT_LENGTH_MISMATCH": {
-    "message": [
-      "Length of object (<object_length>) does not match with length of fields (<field_length>)."
-    ]
-  },
-  "FIELD_STRUCT_LENGTH_MISMATCH_WITH_NAME": {
-    "message": [
-      "<field_name>: Length of object (<object_length>) does not match with length of fields (<field_length>)."
-    ]
-  },
-  "FIELD_TYPE_MISMATCH": {
-    "message": [
-      "<obj> is not an instance of type <data_type>."
-    ]
-  },
-  "FIELD_TYPE_MISMATCH_WITH_NAME": {
-    "message": [
-      "<field_name>: <obj> is not an instance of type <data_type>."
-    ]
-  },
-  "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN": {
-    "message": [
-      "Function `<func_name>` should return Column, got <return_type>."
-    ]
-  },
-  "INCORRECT_CONF_FOR_PROFILE": {
-    "message": [
-      "`spark.python.profile` or `spark.python.profile.memory` configuration",
-      " must be set to `true` to enable Python profile."
-    ]
-  },
-  "INDEX_NOT_POSITIVE": {
-    "message": [
-      "Index must be positive, got '<index>'."
-    ]
-  },
-  "INDEX_OUT_OF_RANGE": {
-    "message": [
-      "<arg_name> index out of range, got '<index>'."
-    ]
-  },
-  "INVALID_ARROW_UDTF_RETURN_TYPE": {
-    "message": [
-      "The return type of the arrow-optimized Python UDTF should be of type 'pandas.DataFrame', but the '<func>' method returned a value of type <return_type> with value: <value>."
-    ]
-  },
-  "INVALID_BROADCAST_OPERATION": {
-    "message": [
-      "Broadcast can only be <operation> in driver."
-    ]
-  },
-  "INVALID_CALL_ON_UNRESOLVED_OBJECT": {
-    "message": [
-      "Invalid call to `<func_name>` on unresolved object."
-    ]
-  },
-  "INVALID_CONNECT_URL": {
-    "message": [
-      "Invalid URL for Spark Connect: <detail>"
-    ]
-  },
-  "INVALID_INTERVAL_CASTING": {
-    "message": [
-      "Interval <start_field> to <end_field> is invalid."
-    ]
-  },
-  "INVALID_ITEM_FOR_CONTAINER": {
-    "message": [
-      "All items in `<arg_name>` should be in <allowed_types>, got <item_type>."
-    ]
-  },
-  "INVALID_MULTIPLE_ARGUMENT_CONDITIONS": {
-    "message": [
-      "[{arg_names}] cannot be <condition>."
-    ]
-  },
-  "INVALID_NDARRAY_DIMENSION": {
-    "message": [
-      "NumPy array input should be of <dimensions> dimensions."
-    ]
-  },
-  "INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP": {
-    "message": [
-      "Invalid number of dataframes in group <dataframes_in_group>."
-    ]
-  },
-  "INVALID_PANDAS_UDF": {
-    "message": [
-      "Invalid function: <detail>"
-    ]
-  },
-  "INVALID_PANDAS_UDF_TYPE": {
-    "message": [
-      "`<arg_name>` should be one of the values from PandasUDFType, got <arg_type>"
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_ARROW_UDF": {
-    "message": [
-      "Grouped and Cogrouped map Arrow UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_PANDAS_UDF": {
-    "message": [
-      "Pandas UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_SESSION_UUID_ID": {
-    "message": [
-      "Parameter value <arg_name> must be a valid UUID format: <origin>"
-    ]
-  },
-  "INVALID_TIMEOUT_TIMESTAMP": {
-    "message": [
-      "Timeout timestamp (<timestamp>) cannot be earlier than the current watermark (<watermark>)."
-    ]
-  },
-  "INVALID_TYPE": {
-    "message": [
-      "Argument `<arg_name>` should not be a <arg_type>."
-    ]
-  },
-  "INVALID_TYPENAME_CALL": {
-    "message": [
-      "StructField does not have typeName. Use typeName on its type explicitly instead."
-    ]
-  },
-  "INVALID_TYPE_DF_EQUALITY_ARG": {
-    "message": [
-      "Expected type <expected_type> for `<arg_name>` but got type <actual_type>."
-    ]
-  },
-  "INVALID_UDF_EVAL_TYPE": {
-    "message": [
-      "Eval type for UDF must be <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_BOTH_RETURN_TYPE_AND_ANALYZE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It has both its return type and an 'analyze' attribute. Please make it have one of either the return type or the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_EVAL_TYPE": {
-    "message": [
-      "The eval type for the UDTF '<name>' is invalid. It must be one of <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_HANDLER_TYPE": {
-    "message": [
-      "The UDTF is invalid. The function handler must be a class, but got '<type>'. Please provide a class as the function handler."
-    ]
-  },
-  "INVALID_UDTF_NO_EVAL": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not implement the required 'eval' method. Please implement the 'eval' method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_RETURN_TYPE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not specify its return type or implement the required 'analyze' static method. Please specify the return type or implement the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_WHEN_USAGE": {
-    "message": [
-      "when() can only be applied on a Column previously generated by when() function, and cannot be applied once otherwise() is applied."
-    ]
-  },
-  "INVALID_WINDOW_BOUND_TYPE": {
-    "message": [
-      "Invalid window bound type: <window_bound_type>."
-    ]
-  },
-  "JAVA_GATEWAY_EXITED": {
-    "message": [
-      "Java gateway process exited before sending its port number."
-    ]
-  },
-  "JVM_ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail."
-    ]
-  },
-  "KEY_NOT_EXISTS": {
-    "message": [
-      "Key `<key>` is not exists."
-    ]
-  },
-  "KEY_VALUE_PAIR_REQUIRED": {
-    "message": [
-      "Key-value pair or a list of pairs is required."
-    ]
-  },
-  "LENGTH_SHOULD_BE_THE_SAME": {
-    "message": [
-      "<arg1> and <arg2> should be of the same length, got <arg1_length> and <arg2_length>."
-    ]
-  },
-  "MASTER_URL_NOT_SET": {
-    "message": [
-      "A master URL must be set in your configuration."
-    ]
-  },
-  "MISSING_LIBRARY_FOR_PROFILER": {
-    "message": [
-      "Install the 'memory_profiler' library in the cluster to enable memory profiling."
-    ]
-  },
-  "MISSING_VALID_PLAN": {
-    "message": [
-      "Argument to <operator> does not contain a valid plan."
-    ]
-  },
-  "MIXED_TYPE_REPLACEMENT": {
-    "message": [
-      "Mixed type replacements are not supported."
-    ]
-  },
-  "NEGATIVE_VALUE": {
-    "message": [
-      "Value for `<arg_name>` must be greater than or equal to 0, got '<arg_value>'."
-    ]
-  },
-  "NOT_BOOL": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float or int, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_LIST_OR_NONE_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int, list, None, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or list, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or str, got <arg_type>."
-    ]
-  },
-  "NOT_CALLABLE": {
-    "message": [
-      "Argument `<arg_name>` should be a callable, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, str or DataType, but got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, float, integer, list or string, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or int, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int, list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a StructType, Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_DATAFRAME": {
-    "message": [
-      "Argument `<arg_name>` should be a DataFrame, got <arg_type>."
-    ]
-  },
-  "NOT_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a DataType or str, got <arg_type>."
-    ]
-  },
-  "NOT_DICT": {
-    "message": [
-      "Argument `<arg_name>` should be a dict, got <arg_type>."
-    ]
-  },
-  "NOT_EXPRESSION": {
-    "message": [
-      "Argument `<arg_name>` should be an Expression, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a float or int, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a float, int, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_IMPLEMENTED": {
-    "message": [
-      "<feature> is not implemented."
-    ]
-  },
-  "NOT_INT": {
-    "message": [
-      "Argument `<arg_name>` should be an int, got <arg_type>."
-    ]
-  },
-  "NOT_INT_OR_SLICE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an int, slice or str, got <arg_type>."
-    ]
-  },
-  "NOT_IN_BARRIER_STAGE": {
-    "message": [
-      "It is not in a barrier stage."
-    ]
-  },
-  "NOT_ITERABLE": {
-    "message": [
-      "<objectName> is not iterable."
-    ]
-  },
-  "NOT_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a list, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a list[float, int], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[str], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_NONE_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a list, None or StructType, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_NUMERIC_COLUMNS": {
-    "message": [
-      "Numeric aggregation function can only be applied on numeric columns, got <invalid_columns>."
-    ]
-  },
-  "NOT_OBSERVATION_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an Observation or str, got <arg_type>."
-    ]
-  },
-  "NOT_SAME_TYPE": {
-    "message": [
-      "Argument `<arg_name1>` and `<arg_name2>` should be the same type, got <arg_type1> and <arg_type2>."
-    ]
-  },
-  "NOT_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a str, got <arg_type>."
-    ]
-  },
-  "NOT_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a struct type, got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_LIST_OF_RDD": {
-    "message": [
-      "Argument `<arg_name>` should be a str or list[RDD], got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a str or struct type, got <arg_type>."
-    ]
-  },
-  "NOT_WINDOWSPEC": {
-    "message": [
-      "Argument `<arg_name>` should be a WindowSpec, got <arg_type>."
-    ]
-  },
-  "NO_ACTIVE_EXCEPTION": {
-    "message": [
-      "No active exception."
-    ]
-  },
-  "NO_ACTIVE_OR_DEFAULT_SESSION": {
-    "message": [
-      "No active or default Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_ACTIVE_SESSION": {
-    "message": [
-      "No active Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_OBSERVE_BEFORE_GET": {
-    "message": [
-      "Should observe by calling `DataFrame.observe` before `get`."
-    ]
-  },
-  "NO_SCHEMA_AND_DRIVER_DEFAULT_SCHEME": {
-    "message": [
-      "Only allows <arg_name> to be a path without scheme, and Spark Driver should use the default scheme to determine the destination file system."
-    ]
-  },
-  "ONLY_ALLOWED_FOR_SINGLE_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` can only be provided for a single column."
-    ]
-  },
-  "ONLY_ALLOW_SINGLE_TRIGGER": {
-    "message": [
-      "Only a single trigger is allowed."
-    ]
-  },
-  "ONLY_SUPPORTED_WITH_SPARK_CONNECT": {
-    "message": [
-      "<feature> is only supported with Spark Connect; however, the current Spark session does not use Spark Connect."
-    ]
-  },
-  "PACKAGE_NOT_INSTALLED": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, it was not found."
-    ]
-  },
-  "PIPE_FUNCTION_EXITED": {
-    "message": [
-      "Pipe function `<func_name>` exited with error code <error_code>."
-    ]
-  },
-  "PYTHON_HASH_SEED_NOT_SET": {
-    "message": [
-      "Randomness of hash of string should be disabled via PYTHONHASHSEED."
-    ]
-  },
-  "PYTHON_STREAMING_DATA_SOURCE_RUNTIME_ERROR": {
-    "message": [
-      "Failed when running Python streaming data source: <msg>"
-    ]
-  },
-  "PYTHON_VERSION_MISMATCH": {
-    "message": [
-      "Python in worker has different version: <worker_version> than that in driver: <driver_version>, PySpark cannot run with different minor versions.",
-      "Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set."
-    ]
-  },
-  "RDD_TRANSFORM_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to broadcast an RDD or reference an RDD from an ",
-      "action or transformation. RDD transformations and actions can only be invoked by the ",
-      "driver, not inside of other transformations; for example, ",
-      "rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values ",
-      "transformation and count action cannot be performed inside of the rdd1.map ",
-      "transformation. For more information, see SPARK-5063."
-    ]
-  },
-  "READ_ONLY": {
-    "message": [
-      "<object> is read-only."
-    ]
-  },
-  "RESPONSE_ALREADY_RECEIVED": {
-    "message": [
-      "OPERATION_NOT_FOUND on the server but responses were already received from it."
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Column names of the returned pyarrow.Table do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Column names of the returned pandas.DataFrame do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: <expected> Actual: <actual>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was <output_length> and the length of input was <input_length>."
-    ]
-  },
-  "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Columns do not match in their data type: <mismatch>."
-    ]
-  },
-  "RETRIES_EXCEEDED": {
-    "message": [
-      "The maximum number of retries has been exceeded."
-    ]
-  },
-  "REUSE_OBSERVATION": {
-    "message": [
-      "An Observation can be used with a DataFrame only once."
-    ]
-  },
-  "SCHEMA_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Result vector from pandas_udf was not the required length: expected <expected>, got <actual>."
-    ]
-  },
-  "SESSION_ALREADY_EXIST": {
-    "message": [
-      "Cannot start a remote Spark session because there is a regular Spark session already running."
-    ]
-  },
-  "SESSION_NEED_CONN_STR_OR_BUILDER": {
-    "message": [
-      "Needs either connection string or channelBuilder (mutually exclusive) to create a new SparkSession."
-    ]
-  },
-  "SESSION_NOT_SAME": {
-    "message": [
-      "Both Datasets must belong to the same SparkSession."
-    ]
-  },
-  "SESSION_OR_CONTEXT_EXISTS": {
-    "message": [
-      "There should not be an existing Spark Session or Spark Context."
-    ]
-  },
-  "SESSION_OR_CONTEXT_NOT_EXISTS": {
-    "message": [
-      "SparkContext or SparkSession should be created first."
-    ]
-  },
-  "SLICE_WITH_STEP": {
-    "message": [
-      "Slice with step is not supported."
-    ]
-  },
-  "STATE_NOT_EXISTS": {
-    "message": [
-      "State is either not defined or has already been removed."
-    ]
-  },
-  "STOP_ITERATION_OCCURRED": {
-    "message": [
-      "Caught StopIteration thrown from user's code; failing the task: <exc>"
-    ]
-  },
-  "STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "pandas iterator UDF should exhaust the input iterator."
-    ]
-  },
-  "STREAMING_CONNECT_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the function `<name>`. If you accessed the Spark session, or a DataFrame defined outside of the function, or any object that contains a Spark session, please be aware that they are not allowed in Spark Connect. For `foreachBatch`, please access the Spark session using `df.sparkSession`, where `df` is the first parameter in your `foreachBatch` function. For `StreamingQueryListener`, please access the Spark session using `self.spark`. For details please check out the PySpark doc for `foreachBatch` and `StreamingQueryListener`."
-    ]
-  },
-  "TEST_CLASS_NOT_COMPILED": {
-    "message": [
-      "<test_class_path> doesn't exist. Spark sql test classes are not compiled."
-    ]
-  },
-  "TOO_MANY_VALUES": {
-    "message": [
-      "Expected <expected> values for `<item>`, got <actual>."
-    ]
-  },
-  "TYPE_HINT_SHOULD_BE_SPECIFIED": {
-    "message": [
-      "Type hints for <target> should be specified; however, got <sig>."
-    ]
-  },
-  "UDF_RETURN_TYPE": {
-    "message": [
-      "Return type of the user-defined function should be <expected>, but is <actual>."
-    ]
-  },
-  "UDTF_ARROW_TYPE_CAST_ERROR": {
-    "message": [
-      "Cannot convert the output value of the column '<col_name>' with type '<col_type>' to the specified return type of the column: '<arrow_type>'. Please check if the data types match and try again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_IMPLEMENTS_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function implements the 'analyze' method, but its constructor has more than two arguments (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, or one 'self' argument plus another argument for the result of the 'analyze' method, and try the query again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function does not implement the 'analyze' method, and its constructor has more than one argument (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, and try the query again."
-    ]
-  },
-  "UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because the function arguments did not match the expected signature of the 'eval' method (<reason>). Please update the query so that this table function call provides arguments matching the expected signature, or else update the table function so that its 'eval' method accepts the provided arguments, and then try the query again."
-    ]
-  },
-  "UDTF_EXEC_ERROR": {
-    "message": [
-      "User defined table function encountered an error in the '<method_name>' method: <error>"
-    ]
-  },
-  "UDTF_INVALID_OUTPUT_ROW_TYPE": {
-    "message": [
-      "The type of an individual output row in the '<func>' method of the UDTF is invalid. Each row should be a tuple, list, or dict, but got '<type>'. Please make sure that the output rows are of the correct type."
-    ]
-  },
-  "UDTF_RETURN_NOT_ITERABLE": {
-    "message": [
-      "The return value of the '<func>' method of the UDTF is invalid. It should be an iterable (e.g., generator or list), but got '<type>'. Please make sure that the UDTF returns one of these types."
-    ]
-  },
-  "UDTF_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "The number of columns in the result does not match the specified schema. Expected column count: <expected>, Actual column count: <actual>. Please make sure the values returned by the '<func>' method have the same number of columns as specified in the output schema."
-    ]
-  },
-  "UDTF_RETURN_TYPE_MISMATCH": {
-    "message": [
-      "Mismatch in return type for the UDTF '<name>'. Expected a 'StructType', but got '<return_type>'. Please ensure the return type is a correctly formatted StructType."
-    ]
-  },
-  "UDTF_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the UDTF '<name>': <message>"
-    ]
-  },
-  "UNEXPECTED_RESPONSE_FROM_SERVER": {
-    "message": [
-      "Unexpected response from iterator server."
-    ]
-  },
-  "UNEXPECTED_TUPLE_WITH_STRUCT": {
-    "message": [
-      "Unexpected tuple <tuple> with StructType."
-    ]
-  },
-  "UNKNOWN_EXPLAIN_MODE": {
-    "message": [
-      "Unknown explain mode: '<explain_mode>'. Accepted explain modes are 'simple', 'extended', 'codegen', 'cost', 'formatted'."
-    ]
-  },
-  "UNKNOWN_INTERRUPT_TYPE": {
-    "message": [
-      "Unknown interrupt type: '<interrupt_type>'. Accepted interrupt types are 'all'."
-    ]
-  },
-  "UNKNOWN_RESPONSE": {
-    "message": [
-      "Unknown response: <response>."
-    ]
-  },
-  "UNKNOWN_VALUE_FOR": {
-    "message": [
-      "Unknown value for `<var>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE": {
-    "message": [
-      "Unsupported DataType `<data_type>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW": {
-    "message": [
-      "Single data type <data_type> is not supported with Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION": {
-    "message": [
-      "<data_type> is not supported in conversion to Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_VERSION": {
-    "message": [
-      "<data_type> is only supported with pyarrow 2.0.0 and above."
-    ]
-  },
-  "UNSUPPORTED_JOIN_TYPE": {
-    "message": [
-      "Unsupported join type: <join_type>. Supported join types include: 'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'semi', 'leftanti', 'left_anti', 'anti', 'cross'."
-    ]
-  },
-  "UNSUPPORTED_LITERAL": {
-    "message": [
-      "Unsupported Literal '<literal>'."
-    ]
-  },
-  "UNSUPPORTED_LOCAL_CONNECTION_STRING": {
-    "message": [
-      "Creating new SparkSessions with `local` connection string is not supported."
-    ]
-  },
-  "UNSUPPORTED_NUMPY_ARRAY_SCALAR": {
-    "message": [
-      "The type of array scalar '<dtype>' is not supported."
-    ]
-  },
-  "UNSUPPORTED_OPERATION": {
-    "message": [
-      "<operation> is not supported."
-    ]
-  },
-  "UNSUPPORTED_PACKAGE_VERSION": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, your version is <current_version>."
-    ]
-  },
-  "UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should use only POSITIONAL or POSITIONAL OR KEYWORD arguments."
-    ]
-  },
-  "UNSUPPORTED_SIGNATURE": {
-    "message": [
-      "Unsupported signature: <signature>."
-    ]
-  },
-  "UNSUPPORTED_WITH_ARROW_OPTIMIZATION": {
-    "message": [
-      "<feature> is not supported with Arrow optimization enabled in Python UDFs. Disable 'spark.sql.execution.pythonUDF.arrow.enabled' to workaround."
-    ]
-  },
-  "VALUE_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` does not allow <disallowed_value>."
-    ]
-  },
-  "VALUE_NOT_ACCESSIBLE": {
-    "message": [
-      "Value `<value>` cannot be accessed inside tasks."
-    ]
-  },
-  "VALUE_NOT_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` has to be amongst the following values: <allowed_values>."
-    ]
-  },
-  "VALUE_NOT_ANY_OR_ALL": {
-    "message": [
-      "Value for `<arg_name>` must be 'any' or 'all', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_BETWEEN": {
-    "message": [
-      "Value for `<arg_name>` must be between <min> and <max>."
-    ]
-  },
-  "VALUE_NOT_NON_EMPTY_STR": {
-    "message": [
-      "Value for `<arg_name>` must be a non-empty string, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PEARSON": {
-    "message": [
-      "Value for `<arg_name>` only supports the 'pearson', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PLAIN_COLUMN_REFERENCE": {
-    "message": [
-      "Value `<val>` in `<field_name>` should be a plain column reference such as `df.col` or `col('column')`."
-    ]
-  },
-  "VALUE_NOT_POSITIVE": {
-    "message": [
-      "Value for `<arg_name>` must be positive, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_TRUE": {
-    "message": [
-      "Value for `<arg_name>` must be True, got '<arg_value>'."
-    ]
-  },
-  "VALUE_OUT_OF_BOUNDS": {
-    "message": [
-      "Value for `<arg_name>` must be between <lower_bound> and <upper_bound> (inclusive), got <actual>"
-    ]
-  },
-  "WRONG_NUM_ARGS_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should take between 1 and 3 arguments, but the provided function takes <num_args>."
-    ]
-  },
-  "WRONG_NUM_COLUMNS": {
-    "message": [
-      "Function `<func_name>` should take at least <num_cols> columns."
-    ]
-  },
-  "ZERO_INDEX": {
-    "message": [
-      "Index must be non-zero."
-    ]
-  }
-}
-'''
-
+# Note: Though we call them "error classes" here, the proper name is "error conditions",
+#   hence why the name of the JSON file different.
+#   For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+#   This discrepancy will be resolved as part of: https://issues.apache.org/jira/browse/SPARK-47429
+# Note: When we drop support for Python 3.8, we should migrate from importlib.resources.read_text()

Review Comment:
   We dropped Python 3.8 lately :-).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1586990733


##########
python/MANIFEST.in:
##########
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   OK. Are you planning to address this in #46331 (or some other PR), or would you like me to take care of it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1468965413


##########
python/pyspark/errors/error-conditions.json:
##########
@@ -0,0 +1,1096 @@
+{

Review Comment:
   The problem is that it has to be packaged together, and able to be uploaded into PyPI. Would be great if we can make sure that still works



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.

itholic commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1914106426

   > This is for a separate potential PR, but if it were possible to use the "main" error JSON files from Scala in PySpark automatically, would we want to do that?
   
   I don't think so. As I recall, the main reason for not doing it was because, as you said, the error structure on the PySpark side is different from the error structure on the JVM side.
   
   > I will be consolidating the various sql-error-* pages. I will tag you in those PRs when I open them.
   
   +1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2027389233

   OK, I tested that as well and updated the PR description accordingly.
   
   I also tweaked the syntax highlighting for that bit documentation you linked to, because it was off. This is how it currently looks:
   
   ![Screenshot 2024-03-29 at 11 30 26 AM](https://github.com/apache/spark/assets/1039369/4ee9b28f-768d-478f-980e-3937fa533029)
   
   Note the weird italicization and missing `*`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584090772


##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
     import json
+    from pathlib import Path
     from pyspark.errors import error_classes
 
-    with open("python/pyspark/errors/error_classes.py", "w") as f:
-        error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+    ERRORS_DIR = Path(__file__).parents[1]
 
-ERROR_CLASSES_MAP = json.loads(ERROR_CLASSES_JSON)
-""" % json.dumps(
-            error_classes.ERROR_CLASSES_MAP, sort_keys=True, indent=2
+    with open(ERRORS_DIR / "error-conditions.json", "w") as f:

Review Comment:
   @nchammas I believe it worked because you built Spark at the project root directory, and `ERRORS_DIR` directory exists ...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1913976049

   The build is still running, but the [pyspark-core tests are passing](https://github.com/nchammas/spark/actions/runs/7691032575/job/20955751265). I believe `importlib.resources` is what we need to load data files packaged in the distribution we upload to PyPI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1913861869

   @itholic can you take a look please? I remember you took a look and failed to find a good way to upload JSON.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2026612169

   Hmm, so when people install this ZIP how exactly do they do it? Because it does not install cleanly like the ZIP under `python/dist/`.
   
   ```
   $ pip install .../spark/python/lib/pyspark.zip
   Processing .../spark/python/lib/pyspark.zip
   ERROR: file:///.../spark/python/lib/pyspark.zip does not appear to be a Python project:
     neither 'setup.py' nor 'pyproject.toml' found.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2084153710

   Friendly ping @HyukjinKwon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1586985679


##########
python/MANIFEST.in:
##########
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   Yes, `graft` pulls everything.
   
   We can try to just include what we think we need, but it's probably safer (and easier) in the long run to instead exclude what we don't want to package, like tests. Would that work for you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2024304980

   Just to make sure, does it work when you install PySpark as a ZIP file? e.g., downloading it from https://spark.apache.org/downloads.html would install PySpark as a ZIP file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2025495693

   I've updated the PR description with this additional ZIP test (test 4).
   
   Just to confirm, the ZIP that gets uploaded to the site is the one under `python/dist/`. Is that correct?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2016860163

   We have agreed in SPARK-46810 to rename "error class" to "error condition", so this PR is unblocked since we know we won't need to rename the new `error-classes.json` file.
   
   The work to rename all instances of "error class" to "error condition" across the board will happen in SPARK-46810 and SPARK-47429. I would like to keep this PR focused on simply moving the Python error conditions into a JSON file.
   
   @HyukjinKwon - I believe this PR is ready to go. Do you have any oustanding concerns?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1470557657


##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
     import json
+    from pathlib import Path
     from pyspark.errors import error_classes
 
-    with open("python/pyspark/errors/error_classes.py", "w") as f:
-        error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+    ERRORS_DIR = Path(__file__).parents[1]

Review Comment:
   I don't know if this particular line will work when PySpark is packaged for distribution, but that's OK because `_write_self()` is meant for use by developers who are writing to the JSON file during development. Right?
   
   I don't think we want to use `importlib.resources` here because that's for loading resources from a potentially read-only volume, which may be the case when PySpark is installed from a ZIP file, for example. Since this is a development tool, we need a functioning filesystem with write access, so `__file__` will work fine.



##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1110 +15,14 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-    "message": [
-      "An application name must be set in your configuration."
-    ]
-  },
-  "ARGUMENT_REQUIRED": {
-    "message": [
-      "Argument `<arg_name>` is required when <condition>."
-    ]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-    "message": [
-      "Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT."
-    ]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-    "message": [
-      "Attribute `<attr_name>` in provided object `<obj_name>` is not callable."
-    ]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported."
-    ]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-    "message": [
-      "Length mismatch: Expected axis has <expected_length> element, new values have <actual_length> elements."
-    ]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-    "message": [
-      "Broadcast variable `<variable>` not loaded."
-    ]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-    "message": [
-      "Not supported to call `<func_name>` before initialize <object>."
-    ]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-    "message": [
-      "`<data_type>` can not accept object `<obj_name>` in type `<obj_type>`."
-    ]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-    "message": [
-      "Dunder(double underscore) attribute is for internal use only."
-    ]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-    "message": [
-      "Cannot apply 'in' operator against a column: please use 'contains' in a string column or 'array_contains' function for an array column."
-    ]
-  },
-  "CANNOT_BE_EMPTY": {
-    "message": [
-      "At least one <item> must be specified."
-    ]
-  },
-  "CANNOT_BE_NONE": {
-    "message": [
-      "Argument `<arg_name>` cannot be None."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-    "message": [
-      "Spark Connect server cannot be configured: Existing [<existing_url>], New [<new_url>]."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-    "message": [
-      "Spark Connect server and Spark master cannot be configured together: Spark master [<master_url>], Spark Connect [<connect_url>]."
-    ]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-    "message": [
-      "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions."
-    ]
-  },
-  "CANNOT_CONVERT_TYPE": {
-    "message": [
-      "Cannot convert <from_type> into <to_type>."
-    ]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-    "message": [
-      "Some of types cannot be determined after inferring."
-    ]
-  },
-  "CANNOT_GET_BATCH_ID": {
-    "message": [
-      "Could not get batch id from <obj_name>."
-    ]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-    "message": [
-      "Can not infer Array Type from a list with None as the first element."
-    ]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-    "message": [
-      "Can not infer schema from an empty dataset."
-    ]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-    "message": [
-      "Can not infer schema for type: `<data_type>`."
-    ]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-    "message": [
-      "Unable to infer the type of the field `<field_name>`."
-    ]
-  },
-  "CANNOT_MERGE_TYPE": {
-    "message": [
-      "Can not merge type `<data_type1>` and `<data_type2>`."
-    ]
-  },
-  "CANNOT_OPEN_SOCKET": {
-    "message": [
-      "Can not open socket: <errors>."
-    ]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-    "message": [
-      "Unable to parse datatype. <msg>."
-    ]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-    "message": [
-      "Metadata can only be provided for a single column."
-    ]
-  },
-  "CANNOT_SET_TOGETHER": {
-    "message": [
-      "<arg_list> should not be set together."
-    ]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-    "message": [
-      "returnType can not be specified when `<arg_name>` is a user-defined function, but got <return_type>."
-    ]
-  },
-  "CANNOT_WITHOUT": {
-    "message": [
-      "Cannot <condition1> without <condition2>."
-    ]
-  },
-  "COLUMN_IN_LIST": {
-    "message": [
-      "`<func_name>` does not allow a Column in a list."
-    ]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-    "message": [
-      "Only one Spark Connect client URL can be set; however, got a different URL [<new_url>] from the existing [<existing_url>]."
-    ]
-  },
-  "CONNECT_URL_NOT_SET": {
-    "message": [
-      "Cannot create a Spark Connect session because the Spark Connect remote URL has not been set. Please define the remote URL by setting either the 'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-    ]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."
-    ]
-  },
-  "CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT": {
-    "message": [
-      "Remote client cannot create a SparkContext. Create SparkSession instead."
-    ]
-  },
-  "DATA_SOURCE_INVALID_RETURN_TYPE": {
-    "message": [
-      "Unsupported return type ('<type>') from Python data source '<name>'. Expected types: <supported_types>."
-    ]
-  },
-  "DATA_SOURCE_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "Return schema mismatch in the result from 'read' method. Expected: <expected> columns, Found: <actual> columns. Make sure the returned values match the required output schema."
-    ]
-  },
-  "DATA_SOURCE_TYPE_MISMATCH": {
-    "message": [
-      "Expected <expected>, but got <actual>."
-    ]
-  },
-  "DIFFERENT_PANDAS_DATAFRAME": {
-    "message": [
-      "DataFrames are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_INDEX": {
-    "message": [
-      "Indices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_MULTIINDEX": {
-    "message": [
-      "MultiIndices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_SERIES": {
-    "message": [
-      "Series are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_ROWS": {
-    "message": [
-      "<error_msg>"
-    ]
-  },
-  "DIFFERENT_SCHEMA": {
-    "message": [
-      "Schemas do not match.",
-      "--- actual",
-      "+++ expected",
-      "<error_msg>"
-    ]
-  },
-  "DISALLOWED_TYPE_FOR_CONTAINER": {
-    "message": [
-      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <item_type>"
-    ]
-  },
-  "DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT": {
-    "message": [
-      "Duplicated field names in Arrow Struct are not allowed, got <field_names>"
-    ]
-  },
-  "ERROR_OCCURRED_WHILE_CALLING": {
-    "message": [
-      "An error occurred while calling <func_name>: <error_msg>."
-    ]
-  },
-  "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN": {
-    "message": [
-      "Function `<func_name>` should return Column, got <return_type>."
-    ]
-  },
-  "INCORRECT_CONF_FOR_PROFILE": {
-    "message": [
-      "`spark.python.profile` or `spark.python.profile.memory` configuration",
-      " must be set to `true` to enable Python profile."
-    ]
-  },
-  "INDEX_NOT_POSITIVE": {
-    "message": [
-      "Index must be positive, got '<index>'."
-    ]
-  },
-  "INDEX_OUT_OF_RANGE": {
-    "message": [
-      "<arg_name> index out of range, got '<index>'."
-    ]
-  },
-  "INVALID_ARROW_UDTF_RETURN_TYPE": {
-    "message": [
-      "The return type of the arrow-optimized Python UDTF should be of type 'pandas.DataFrame', but the '<func>' method returned a value of type <return_type> with value: <value>."
-    ]
-  },
-  "INVALID_BROADCAST_OPERATION": {
-    "message": [
-      "Broadcast can only be <operation> in driver."
-    ]
-  },
-  "INVALID_CALL_ON_UNRESOLVED_OBJECT": {
-    "message": [
-      "Invalid call to `<func_name>` on unresolved object."
-    ]
-  },
-  "INVALID_CONNECT_URL": {
-    "message": [
-      "Invalid URL for Spark Connect: <detail>"
-    ]
-  },
-  "INVALID_INTERVAL_CASTING": {
-    "message": [
-      "Interval <start_field> to <end_field> is invalid."
-    ]
-  },
-  "INVALID_ITEM_FOR_CONTAINER": {
-    "message": [
-      "All items in `<arg_name>` should be in <allowed_types>, got <item_type>."
-    ]
-  },
-  "INVALID_MULTIPLE_ARGUMENT_CONDITIONS": {
-    "message": [
-      "[{arg_names}] cannot be <condition>."
-    ]
-  },
-  "INVALID_NDARRAY_DIMENSION": {
-    "message": [
-      "NumPy array input should be of <dimensions> dimensions."
-    ]
-  },
-  "INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP": {
-    "message": [
-      "Invalid number of dataframes in group <dataframes_in_group>."
-    ]
-  },
-  "INVALID_PANDAS_UDF": {
-    "message": [
-      "Invalid function: <detail>"
-    ]
-  },
-  "INVALID_PANDAS_UDF_TYPE": {
-    "message": [
-      "`<arg_name>` should be one of the values from PandasUDFType, got <arg_type>"
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_ARROW_UDF": {
-    "message": [
-      "Grouped and Cogrouped map Arrow UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_PANDAS_UDF": {
-    "message": [
-      "Pandas UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_SESSION_UUID_ID": {
-    "message": [
-      "Parameter value <arg_name> must be a valid UUID format: <origin>"
-    ]
-  },
-  "INVALID_TIMEOUT_TIMESTAMP": {
-    "message": [
-      "Timeout timestamp (<timestamp>) cannot be earlier than the current watermark (<watermark>)."
-    ]
-  },
-  "INVALID_TYPE": {
-    "message": [
-      "Argument `<arg_name>` should not be a <arg_type>."
-    ]
-  },
-  "INVALID_TYPENAME_CALL": {
-    "message": [
-      "StructField does not have typeName. Use typeName on its type explicitly instead."
-    ]
-  },
-  "INVALID_TYPE_DF_EQUALITY_ARG": {
-    "message": [
-      "Expected type <expected_type> for `<arg_name>` but got type <actual_type>."
-    ]
-  },
-  "INVALID_UDF_EVAL_TYPE": {
-    "message": [
-      "Eval type for UDF must be <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_BOTH_RETURN_TYPE_AND_ANALYZE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It has both its return type and an 'analyze' attribute. Please make it have one of either the return type or the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_EVAL_TYPE": {
-    "message": [
-      "The eval type for the UDTF '<name>' is invalid. It must be one of <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_HANDLER_TYPE": {
-    "message": [
-      "The UDTF is invalid. The function handler must be a class, but got '<type>'. Please provide a class as the function handler."
-    ]
-  },
-  "INVALID_UDTF_NO_EVAL": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not implement the required 'eval' method. Please implement the 'eval' method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_RETURN_TYPE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not specify its return type or implement the required 'analyze' static method. Please specify the return type or implement the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_WHEN_USAGE": {
-    "message": [
-      "when() can only be applied on a Column previously generated by when() function, and cannot be applied once otherwise() is applied."
-    ]
-  },
-  "INVALID_WINDOW_BOUND_TYPE": {
-    "message": [
-      "Invalid window bound type: <window_bound_type>."
-    ]
-  },
-  "JAVA_GATEWAY_EXITED": {
-    "message": [
-      "Java gateway process exited before sending its port number."
-    ]
-  },
-  "JVM_ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail."
-    ]
-  },
-  "KEY_NOT_EXISTS": {
-    "message": [
-      "Key `<key>` is not exists."
-    ]
-  },
-  "KEY_VALUE_PAIR_REQUIRED": {
-    "message": [
-      "Key-value pair or a list of pairs is required."
-    ]
-  },
-  "LENGTH_SHOULD_BE_THE_SAME": {
-    "message": [
-      "<arg1> and <arg2> should be of the same length, got <arg1_length> and <arg2_length>."
-    ]
-  },
-  "MASTER_URL_NOT_SET": {
-    "message": [
-      "A master URL must be set in your configuration."
-    ]
-  },
-  "MISSING_LIBRARY_FOR_PROFILER": {
-    "message": [
-      "Install the 'memory_profiler' library in the cluster to enable memory profiling."
-    ]
-  },
-  "MISSING_VALID_PLAN": {
-    "message": [
-      "Argument to <operator> does not contain a valid plan."
-    ]
-  },
-  "MIXED_TYPE_REPLACEMENT": {
-    "message": [
-      "Mixed type replacements are not supported."
-    ]
-  },
-  "NEGATIVE_VALUE": {
-    "message": [
-      "Value for `<arg_name>` must be greater than or equal to 0, got '<arg_value>'."
-    ]
-  },
-  "NOT_BOOL": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float or int, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_LIST_OR_NONE_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int, list, None, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or list, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or str, got <arg_type>."
-    ]
-  },
-  "NOT_CALLABLE": {
-    "message": [
-      "Argument `<arg_name>` should be a callable, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, str or DataType, but got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, float, integer, list or string, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or int, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int, list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a StructType, Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_DATAFRAME": {
-    "message": [
-      "Argument `<arg_name>` should be a DataFrame, got <arg_type>."
-    ]
-  },
-  "NOT_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a DataType or str, got <arg_type>."
-    ]
-  },
-  "NOT_DICT": {
-    "message": [
-      "Argument `<arg_name>` should be a dict, got <arg_type>."
-    ]
-  },
-  "NOT_EXPRESSION": {
-    "message": [
-      "Argument `<arg_name>` should be an Expression, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a float or int, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a float, int, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_IMPLEMENTED": {
-    "message": [
-      "<feature> is not implemented."
-    ]
-  },
-  "NOT_INSTANCE_OF": {
-    "message": [
-      "<value> is not an instance of type <type>."
-    ]
-  },
-  "NOT_INT": {
-    "message": [
-      "Argument `<arg_name>` should be an int, got <arg_type>."
-    ]
-  },
-  "NOT_INT_OR_SLICE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an int, slice or str, got <arg_type>."
-    ]
-  },
-  "NOT_IN_BARRIER_STAGE": {
-    "message": [
-      "It is not in a barrier stage."
-    ]
-  },
-  "NOT_ITERABLE": {
-    "message": [
-      "<objectName> is not iterable."
-    ]
-  },
-  "NOT_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a list, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a list[float, int], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[str], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_NONE_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a list, None or StructType, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_NUMERIC_COLUMNS": {
-    "message": [
-      "Numeric aggregation function can only be applied on numeric columns, got <invalid_columns>."
-    ]
-  },
-  "NOT_OBSERVATION_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an Observation or str, got <arg_type>."
-    ]
-  },
-  "NOT_SAME_TYPE": {
-    "message": [
-      "Argument `<arg_name1>` and `<arg_name2>` should be the same type, got <arg_type1> and <arg_type2>."
-    ]
-  },
-  "NOT_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a str, got <arg_type>."
-    ]
-  },
-  "NOT_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a struct type, got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_LIST_OF_RDD": {
-    "message": [
-      "Argument `<arg_name>` should be a str or list[RDD], got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a str or struct type, got <arg_type>."
-    ]
-  },
-  "NOT_WINDOWSPEC": {
-    "message": [
-      "Argument `<arg_name>` should be a WindowSpec, got <arg_type>."
-    ]
-  },
-  "NO_ACTIVE_EXCEPTION": {
-    "message": [
-      "No active exception."
-    ]
-  },
-  "NO_ACTIVE_OR_DEFAULT_SESSION": {
-    "message": [
-      "No active or default Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_ACTIVE_SESSION": {
-    "message": [
-      "No active Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_OBSERVE_BEFORE_GET": {
-    "message": [
-      "Should observe by calling `DataFrame.observe` before `get`."
-    ]
-  },
-  "NO_SCHEMA_AND_DRIVER_DEFAULT_SCHEME": {
-    "message": [
-      "Only allows <arg_name> to be a path without scheme, and Spark Driver should use the default scheme to determine the destination file system."
-    ]
-  },
-  "ONLY_ALLOWED_FOR_SINGLE_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` can only be provided for a single column."
-    ]
-  },
-  "ONLY_ALLOW_SINGLE_TRIGGER": {
-    "message": [
-      "Only a single trigger is allowed."
-    ]
-  },
-  "ONLY_SUPPORTED_WITH_SPARK_CONNECT": {
-    "message": [
-      "<feature> is only supported with Spark Connect; however, the current Spark session does not use Spark Connect."
-    ]
-  },
-  "PACKAGE_NOT_INSTALLED": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, it was not found."
-    ]
-  },
-  "PIPE_FUNCTION_EXITED": {
-    "message": [
-      "Pipe function `<func_name>` exited with error code <error_code>."
-    ]
-  },
-  "PYTHON_HASH_SEED_NOT_SET": {
-    "message": [
-      "Randomness of hash of string should be disabled via PYTHONHASHSEED."
-    ]
-  },
-  "PYTHON_VERSION_MISMATCH": {
-    "message": [
-      "Python in worker has different version: <worker_version> than that in driver: <driver_version>, PySpark cannot run with different minor versions.",
-      "Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set."
-    ]
-  },
-  "RDD_TRANSFORM_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to broadcast an RDD or reference an RDD from an ",
-      "action or transformation. RDD transformations and actions can only be invoked by the ",
-      "driver, not inside of other transformations; for example, ",
-      "rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values ",
-      "transformation and count action cannot be performed inside of the rdd1.map ",
-      "transformation. For more information, see SPARK-5063."
-    ]
-  },
-  "READ_ONLY": {
-    "message": [
-      "<object> is read-only."
-    ]
-  },
-  "RESPONSE_ALREADY_RECEIVED": {
-    "message": [
-      "OPERATION_NOT_FOUND on the server but responses were already received from it."
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Column names of the returned pyarrow.Table do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Column names of the returned pandas.DataFrame do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: <expected> Actual: <actual>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was <output_length> and the length of input was <input_length>."
-    ]
-  },
-  "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Columns do not match in their data type: <mismatch>."
-    ]
-  },
-  "RETRIES_EXCEEDED": {
-    "message": [
-      "The maximum number of retries has been exceeded."
-    ]
-  },
-  "REUSE_OBSERVATION": {
-    "message": [
-      "An Observation can be used with a DataFrame only once."
-    ]
-  },
-  "SCHEMA_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Result vector from pandas_udf was not the required length: expected <expected>, got <actual>."
-    ]
-  },
-  "SESSION_ALREADY_EXIST": {
-    "message": [
-      "Cannot start a remote Spark session because there is a regular Spark session already running."
-    ]
-  },
-  "SESSION_NEED_CONN_STR_OR_BUILDER": {
-    "message": [
-      "Needs either connection string or channelBuilder (mutually exclusive) to create a new SparkSession."
-    ]
-  },
-  "SESSION_NOT_SAME": {
-    "message": [
-      "Both Datasets must belong to the same SparkSession."
-    ]
-  },
-  "SESSION_OR_CONTEXT_EXISTS": {
-    "message": [
-      "There should not be an existing Spark Session or Spark Context."
-    ]
-  },
-  "SESSION_OR_CONTEXT_NOT_EXISTS": {
-    "message": [
-      "SparkContext or SparkSession should be created first."
-    ]
-  },
-  "SLICE_WITH_STEP": {
-    "message": [
-      "Slice with step is not supported."
-    ]
-  },
-  "STATE_NOT_EXISTS": {
-    "message": [
-      "State is either not defined or has already been removed."
-    ]
-  },
-  "STOP_ITERATION_OCCURRED": {
-    "message": [
-      "Caught StopIteration thrown from user's code; failing the task: <exc>"
-    ]
-  },
-  "STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "pandas iterator UDF should exhaust the input iterator."
-    ]
-  },
-  "STREAMING_CONNECT_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the function `<name>`. If you accessed the Spark session, or a DataFrame defined outside of the function, or any object that contains a Spark session, please be aware that they are not allowed in Spark Connect. For `foreachBatch`, please access the Spark session using `df.sparkSession`, where `df` is the first parameter in your `foreachBatch` function. For `StreamingQueryListener`, please access the Spark session using `self.spark`. For details please check out the PySpark doc for `foreachBatch` and `StreamingQueryListener`."
-    ]
-  },
-  "TEST_CLASS_NOT_COMPILED": {
-    "message": [
-      "<test_class_path> doesn't exist. Spark sql test classes are not compiled."
-    ]
-  },
-  "TOO_MANY_VALUES": {
-    "message": [
-      "Expected <expected> values for `<item>`, got <actual>."
-    ]
-  },
-  "TYPE_HINT_SHOULD_BE_SPECIFIED": {
-    "message": [
-      "Type hints for <target> should be specified; however, got <sig>."
-    ]
-  },
-  "UDF_RETURN_TYPE": {
-    "message": [
-      "Return type of the user-defined function should be <expected>, but is <actual>."
-    ]
-  },
-  "UDTF_ARROW_TYPE_CAST_ERROR": {
-    "message": [
-      "Cannot convert the output value of the column '<col_name>' with type '<col_type>' to the specified return type of the column: '<arrow_type>'. Please check if the data types match and try again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_IMPLEMENTS_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function implements the 'analyze' method, but its constructor has more than two arguments (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, or one 'self' argument plus another argument for the result of the 'analyze' method, and try the query again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function does not implement the 'analyze' method, and its constructor has more than one argument (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, and try the query again."
-    ]
-  },
-  "UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because the function arguments did not match the expected signature of the 'eval' method (<reason>). Please update the query so that this table function call provides arguments matching the expected signature, or else update the table function so that its 'eval' method accepts the provided arguments, and then try the query again."
-    ]
-  },
-  "UDTF_EXEC_ERROR": {
-    "message": [
-      "User defined table function encountered an error in the '<method_name>' method: <error>"
-    ]
-  },
-  "UDTF_INVALID_OUTPUT_ROW_TYPE": {
-    "message": [
-      "The type of an individual output row in the '<func>' method of the UDTF is invalid. Each row should be a tuple, list, or dict, but got '<type>'. Please make sure that the output rows are of the correct type."
-    ]
-  },
-  "UDTF_RETURN_NOT_ITERABLE": {
-    "message": [
-      "The return value of the '<func>' method of the UDTF is invalid. It should be an iterable (e.g., generator or list), but got '<type>'. Please make sure that the UDTF returns one of these types."
-    ]
-  },
-  "UDTF_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "The number of columns in the result does not match the specified schema. Expected column count: <expected>, Actual column count: <actual>. Please make sure the values returned by the '<func>' method have the same number of columns as specified in the output schema."
-    ]
-  },
-  "UDTF_RETURN_TYPE_MISMATCH": {
-    "message": [
-      "Mismatch in return type for the UDTF '<name>'. Expected a 'StructType', but got '<return_type>'. Please ensure the return type is a correctly formatted StructType."
-    ]
-  },
-  "UDTF_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the UDTF '<name>': <message>"
-    ]
-  },
-  "UNEXPECTED_RESPONSE_FROM_SERVER": {
-    "message": [
-      "Unexpected response from iterator server."
-    ]
-  },
-  "UNEXPECTED_TUPLE_WITH_STRUCT": {
-    "message": [
-      "Unexpected tuple <tuple> with StructType."
-    ]
-  },
-  "UNKNOWN_EXPLAIN_MODE": {
-    "message": [
-      "Unknown explain mode: '<explain_mode>'. Accepted explain modes are 'simple', 'extended', 'codegen', 'cost', 'formatted'."
-    ]
-  },
-  "UNKNOWN_INTERRUPT_TYPE": {
-    "message": [
-      "Unknown interrupt type: '<interrupt_type>'. Accepted interrupt types are 'all'."
-    ]
-  },
-  "UNKNOWN_RESPONSE": {
-    "message": [
-      "Unknown response: <response>."
-    ]
-  },
-  "UNKNOWN_VALUE_FOR": {
-    "message": [
-      "Unknown value for `<var>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE": {
-    "message": [
-      "Unsupported DataType `<data_type>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW": {
-    "message": [
-      "Single data type <data_type> is not supported with Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION": {
-    "message": [
-      "<data_type> is not supported in conversion to Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_VERSION": {
-    "message": [
-      "<data_type> is only supported with pyarrow 2.0.0 and above."
-    ]
-  },
-  "UNSUPPORTED_JOIN_TYPE": {
-    "message": [
-      "Unsupported join type: <join_type>. Supported join types include: 'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'semi', 'leftanti', 'left_anti', 'anti', 'cross'."
-    ]
-  },
-  "UNSUPPORTED_LITERAL": {
-    "message": [
-      "Unsupported Literal '<literal>'."
-    ]
-  },
-  "UNSUPPORTED_LOCAL_CONNECTION_STRING": {
-    "message": [
-      "Creating new SparkSessions with `local` connection string is not supported."
-    ]
-  },
-  "UNSUPPORTED_NUMPY_ARRAY_SCALAR": {
-    "message": [
-      "The type of array scalar '<dtype>' is not supported."
-    ]
-  },
-  "UNSUPPORTED_OPERATION": {
-    "message": [
-      "<operation> is not supported."
-    ]
-  },
-  "UNSUPPORTED_PACKAGE_VERSION": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, your version is <current_version>."
-    ]
-  },
-  "UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should use only POSITIONAL or POSITIONAL OR KEYWORD arguments."
-    ]
-  },
-  "UNSUPPORTED_SIGNATURE": {
-    "message": [
-      "Unsupported signature: <signature>."
-    ]
-  },
-  "UNSUPPORTED_WITH_ARROW_OPTIMIZATION": {
-    "message": [
-      "<feature> is not supported with Arrow optimization enabled in Python UDFs. Disable 'spark.sql.execution.pythonUDF.arrow.enabled' to workaround."
-    ]
-  },
-  "VALUE_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` does not allow <disallowed_value>."
-    ]
-  },
-  "VALUE_NOT_ACCESSIBLE": {
-    "message": [
-      "Value `<value>` cannot be accessed inside tasks."
-    ]
-  },
-  "VALUE_NOT_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` has to be amongst the following values: <allowed_values>."
-    ]
-  },
-  "VALUE_NOT_ANY_OR_ALL": {
-    "message": [
-      "Value for `<arg_name>` must be 'any' or 'all', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_BETWEEN": {
-    "message": [
-      "Value for `<arg_name>` must be between <min> and <max>."
-    ]
-  },
-  "VALUE_NOT_NON_EMPTY_STR": {
-    "message": [
-      "Value for `<arg_name>` must be a non-empty string, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PEARSON": {
-    "message": [
-      "Value for `<arg_name>` only supports the 'pearson', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PLAIN_COLUMN_REFERENCE": {
-    "message": [
-      "Value `<val>` in `<field_name>` should be a plain column reference such as `df.col` or `col('column')`."
-    ]
-  },
-  "VALUE_NOT_POSITIVE": {
-    "message": [
-      "Value for `<arg_name>` must be positive, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_TRUE": {
-    "message": [
-      "Value for `<arg_name>` must be True, got '<arg_value>'."
-    ]
-  },
-  "VALUE_OUT_OF_BOUND": {
-    "message": [
-      "Value for `<arg_name>` must be greater than <lower_bound> or less than <upper_bound>, got <actual>"
-    ]
-  },
-  "WRONG_NUM_ARGS_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should take between 1 and 3 arguments, but the provided function takes <num_args>."
-    ]
-  },
-  "WRONG_NUM_COLUMNS": {
-    "message": [
-      "Function `<func_name>` should take at least <num_cols> columns."
-    ]
-  }
-}
-'''
-
+# Note: Though we call them "error classes" here, the proper name is "error conditions",
+#   hence why the name of the JSON file different.
+#   For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+# Note: When we drop support for Python 3.8, we should migrate from importlib.resources.read_text()
+#   to importlib.resources.files().joinpath().read_text().
+#   See: https://docs.python.org/3/library/importlib.resources.html#importlib.resources.open_text
+ERROR_CLASSES_JSON = importlib.resources.read_text("pyspark.errors", "error-conditions.json")

Review Comment:
   @HyukjinKwon - I think you are actually concerned about this line here, right?
   
   I updated the PR description with the tests I ran to confirm that the JSON file gets packaged correctly and loaded at runtime.
   
   You asked about installing PySpark via pip vs. downloading it from the site. I only tested pip, but both should work because `importlib.resources` uses the same access pattern that `import` does.
   
   Is there some additional test I can run to cover the scenarios you are concerned about?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1470387128


##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
     import json
+    from pathlib import Path
     from pyspark.errors import error_classes
 
-    with open("python/pyspark/errors/error_classes.py", "w") as f:
-        error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+    ERRORS_DIR = Path(__file__).parents[1]

Review Comment:
   Just to confirm, so it works both when you `pip install pyspark`  and when download this from Apache Spark channel (https://spark.apache.org/downloads.html)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584091032


##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
     import json
+    from pathlib import Path
     from pyspark.errors import error_classes
 
-    with open("python/pyspark/errors/error_classes.py", "w") as f:
-        error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+    ERRORS_DIR = Path(__file__).parents[1]
 
-ERROR_CLASSES_MAP = json.loads(ERROR_CLASSES_JSON)
-""" % json.dumps(
-            error_classes.ERROR_CLASSES_MAP, sort_keys=True, indent=2
+    with open(ERRORS_DIR / "error-conditions.json", "w") as f:

Review Comment:
   We should read `error-conditions.json` from `pyspark.zip` .. and that's the real problem ..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1469019240


##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1110 +15,16 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+from pathlib import Path
 
+THIS_DIR = Path(__file__).parent

Review Comment:
   Hmmm .. I thought we should change sth in `setup.py` to include JSON as a data file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1469030237


##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1110 +15,16 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+from pathlib import Path
 
+THIS_DIR = Path(__file__).parent

Review Comment:
   Actually, you are right. `MANIFEST.in` needs to be adjusted. I see the JSON file added to the `dist/` directory, but it doesn't get installed into the virtual environment I created for testing.
   
   To find this, I had to adjust my test from `pip install -e .` to `pip install .`, so that the virtual environment gets its own copy of PySpark and does not rely on the source repo.
   
   Fix incoming...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1913899832

> since we cannot integrate with the [error-classes.json](https://github.com/databricks/runtime/blob/master/common/utils/src/main/resources/error/error-classes.json) file on the JVM side

This is for a separate potential PR, but if it were possible to use the "main" error JSON files from Scala in PySpark automatically, would we want to do that? I see that PySpark's errors don't define a SQLSTATE, so I assumed they were a separate thing and we didn't want to reuse the main error definitions.

> So I agree to change to a `json` file if the advantage of using a `json` file over using a `py` file is clear, and if there are no issues with packaging. Also you might need to take a deeper look at the documentation. For example we're pointing the `py` file path from [Error classes in PySpark](https://spark.apache.org/docs/latest/api/python/development/errors.html#error-classes-in-pyspark).

Is this a reference to this command? https://github.com/apache/spark/blob/8060e7e73170c0122acb2a005f3c54487e226208/python/docs/source/conf.py#L42

Anyway, just FYI I am in the middle of revamping the main error documentation to make it easier to maintain. First I am cleaning up the terminology in #44902 but then I will consolidating the various `sql-error-*` pages. I will tag you in those PRs when I open them.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2026508352

   It's the one `python/lib/pyspark.zip` when you finish building.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1585016463


##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1160 +15,15 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-    "message": [
-      "An application name must be set in your configuration."
-    ]
-  },
-  "ARGUMENT_REQUIRED": {
-    "message": [
-      "Argument `<arg_name>` is required when <condition>."
-    ]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-    "message": [
-      "Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT."
-    ]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-    "message": [
-      "Attribute `<attr_name>` in provided object `<obj_name>` is not callable."
-    ]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported."
-    ]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-    "message": [
-      "Length mismatch: Expected axis has <expected_length> element, new values have <actual_length> elements."
-    ]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-    "message": [
-      "Broadcast variable `<variable>` not loaded."
-    ]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-    "message": [
-      "Not supported to call `<func_name>` before initialize <object>."
-    ]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-    "message": [
-      "`<data_type>` can not accept object `<obj_name>` in type `<obj_type>`."
-    ]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-    "message": [
-      "Dunder(double underscore) attribute is for internal use only."
-    ]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-    "message": [
-      "Cannot apply 'in' operator against a column: please use 'contains' in a string column or 'array_contains' function for an array column."
-    ]
-  },
-  "CANNOT_BE_EMPTY": {
-    "message": [
-      "At least one <item> must be specified."
-    ]
-  },
-  "CANNOT_BE_NONE": {
-    "message": [
-      "Argument `<arg_name>` cannot be None."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-    "message": [
-      "Spark Connect server cannot be configured: Existing [<existing_url>], New [<new_url>]."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-    "message": [
-      "Spark Connect server and Spark master cannot be configured together: Spark master [<master_url>], Spark Connect [<connect_url>]."
-    ]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-    "message": [
-      "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions."
-    ]
-  },
-  "CANNOT_CONVERT_TYPE": {
-    "message": [
-      "Cannot convert <from_type> into <to_type>."
-    ]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-    "message": [
-      "Some of types cannot be determined after inferring."
-    ]
-  },
-  "CANNOT_GET_BATCH_ID": {
-    "message": [
-      "Could not get batch id from <obj_name>."
-    ]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-    "message": [
-      "Can not infer Array Type from a list with None as the first element."
-    ]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-    "message": [
-      "Can not infer schema from an empty dataset."
-    ]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-    "message": [
-      "Can not infer schema for type: `<data_type>`."
-    ]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-    "message": [
-      "Unable to infer the type of the field `<field_name>`."
-    ]
-  },
-  "CANNOT_MERGE_TYPE": {
-    "message": [
-      "Can not merge type `<data_type1>` and `<data_type2>`."
-    ]
-  },
-  "CANNOT_OPEN_SOCKET": {
-    "message": [
-      "Can not open socket: <errors>."
-    ]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-    "message": [
-      "Unable to parse datatype. <msg>."
-    ]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-    "message": [
-      "Metadata can only be provided for a single column."
-    ]
-  },
-  "CANNOT_SET_TOGETHER": {
-    "message": [
-      "<arg_list> should not be set together."
-    ]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-    "message": [
-      "returnType can not be specified when `<arg_name>` is a user-defined function, but got <return_type>."
-    ]
-  },
-  "CANNOT_WITHOUT": {
-    "message": [
-      "Cannot <condition1> without <condition2>."
-    ]
-  },
-  "COLUMN_IN_LIST": {
-    "message": [
-      "`<func_name>` does not allow a Column in a list."
-    ]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-    "message": [
-      "Only one Spark Connect client URL can be set; however, got a different URL [<new_url>] from the existing [<existing_url>]."
-    ]
-  },
-  "CONNECT_URL_NOT_SET": {
-    "message": [
-      "Cannot create a Spark Connect session because the Spark Connect remote URL has not been set. Please define the remote URL by setting either the 'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-    ]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."
-    ]
-  },
-  "CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT": {
-    "message": [
-      "Remote client cannot create a SparkContext. Create SparkSession instead."
-    ]
-  },
-  "DATA_SOURCE_CREATE_ERROR": {
-    "message": [
-      "Failed to create python data source instance, error: <error>."
-    ]
-  },
-  "DATA_SOURCE_INVALID_RETURN_TYPE": {
-    "message": [
-      "Unsupported return type ('<type>') from Python data source '<name>'. Expected types: <supported_types>."
-    ]
-  },
-  "DATA_SOURCE_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "Return schema mismatch in the result from 'read' method. Expected: <expected> columns, Found: <actual> columns. Make sure the returned values match the required output schema."
-    ]
-  },
-  "DATA_SOURCE_TYPE_MISMATCH": {
-    "message": [
-      "Expected <expected>, but got <actual>."
-    ]
-  },
-  "DIFFERENT_PANDAS_DATAFRAME": {
-    "message": [
-      "DataFrames are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_INDEX": {
-    "message": [
-      "Indices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_MULTIINDEX": {
-    "message": [
-      "MultiIndices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_SERIES": {
-    "message": [
-      "Series are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_ROWS": {
-    "message": [
-      "<error_msg>"
-    ]
-  },
-  "DIFFERENT_SCHEMA": {
-    "message": [
-      "Schemas do not match.",
-      "--- actual",
-      "+++ expected",
-      "<error_msg>"
-    ]
-  },
-  "DISALLOWED_TYPE_FOR_CONTAINER": {
-    "message": [
-      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <item_type>"
-    ]
-  },
-  "DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT": {
-    "message": [
-      "Duplicated field names in Arrow Struct are not allowed, got <field_names>"
-    ]
-  },
-  "ERROR_OCCURRED_WHILE_CALLING": {
-    "message": [
-      "An error occurred while calling <func_name>: <error_msg>."
-    ]
-  },
-  "FIELD_DATA_TYPE_UNACCEPTABLE": {
-    "message": [
-      "<data_type> can not accept object <obj> in type <obj_type>."
-    ]
-  },
-  "FIELD_DATA_TYPE_UNACCEPTABLE_WITH_NAME": {
-    "message": [
-      "<field_name>: <data_type> can not accept object <obj> in type <obj_type>."
-    ]
-  },
-  "FIELD_NOT_NULLABLE": {
-    "message": [
-      "Field is not nullable, but got None."
-    ]
-  },
-  "FIELD_NOT_NULLABLE_WITH_NAME": {
-    "message": [
-      "<field_name>: This field is not nullable, but got None."
-    ]
-  },
-  "FIELD_STRUCT_LENGTH_MISMATCH": {
-    "message": [
-      "Length of object (<object_length>) does not match with length of fields (<field_length>)."
-    ]
-  },
-  "FIELD_STRUCT_LENGTH_MISMATCH_WITH_NAME": {
-    "message": [
-      "<field_name>: Length of object (<object_length>) does not match with length of fields (<field_length>)."
-    ]
-  },
-  "FIELD_TYPE_MISMATCH": {
-    "message": [
-      "<obj> is not an instance of type <data_type>."
-    ]
-  },
-  "FIELD_TYPE_MISMATCH_WITH_NAME": {
-    "message": [
-      "<field_name>: <obj> is not an instance of type <data_type>."
-    ]
-  },
-  "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN": {
-    "message": [
-      "Function `<func_name>` should return Column, got <return_type>."
-    ]
-  },
-  "INCORRECT_CONF_FOR_PROFILE": {
-    "message": [
-      "`spark.python.profile` or `spark.python.profile.memory` configuration",
-      " must be set to `true` to enable Python profile."
-    ]
-  },
-  "INDEX_NOT_POSITIVE": {
-    "message": [
-      "Index must be positive, got '<index>'."
-    ]
-  },
-  "INDEX_OUT_OF_RANGE": {
-    "message": [
-      "<arg_name> index out of range, got '<index>'."
-    ]
-  },
-  "INVALID_ARROW_UDTF_RETURN_TYPE": {
-    "message": [
-      "The return type of the arrow-optimized Python UDTF should be of type 'pandas.DataFrame', but the '<func>' method returned a value of type <return_type> with value: <value>."
-    ]
-  },
-  "INVALID_BROADCAST_OPERATION": {
-    "message": [
-      "Broadcast can only be <operation> in driver."
-    ]
-  },
-  "INVALID_CALL_ON_UNRESOLVED_OBJECT": {
-    "message": [
-      "Invalid call to `<func_name>` on unresolved object."
-    ]
-  },
-  "INVALID_CONNECT_URL": {
-    "message": [
-      "Invalid URL for Spark Connect: <detail>"
-    ]
-  },
-  "INVALID_INTERVAL_CASTING": {
-    "message": [
-      "Interval <start_field> to <end_field> is invalid."
-    ]
-  },
-  "INVALID_ITEM_FOR_CONTAINER": {
-    "message": [
-      "All items in `<arg_name>` should be in <allowed_types>, got <item_type>."
-    ]
-  },
-  "INVALID_MULTIPLE_ARGUMENT_CONDITIONS": {
-    "message": [
-      "[{arg_names}] cannot be <condition>."
-    ]
-  },
-  "INVALID_NDARRAY_DIMENSION": {
-    "message": [
-      "NumPy array input should be of <dimensions> dimensions."
-    ]
-  },
-  "INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP": {
-    "message": [
-      "Invalid number of dataframes in group <dataframes_in_group>."
-    ]
-  },
-  "INVALID_PANDAS_UDF": {
-    "message": [
-      "Invalid function: <detail>"
-    ]
-  },
-  "INVALID_PANDAS_UDF_TYPE": {
-    "message": [
-      "`<arg_name>` should be one of the values from PandasUDFType, got <arg_type>"
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_ARROW_UDF": {
-    "message": [
-      "Grouped and Cogrouped map Arrow UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_PANDAS_UDF": {
-    "message": [
-      "Pandas UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_SESSION_UUID_ID": {
-    "message": [
-      "Parameter value <arg_name> must be a valid UUID format: <origin>"
-    ]
-  },
-  "INVALID_TIMEOUT_TIMESTAMP": {
-    "message": [
-      "Timeout timestamp (<timestamp>) cannot be earlier than the current watermark (<watermark>)."
-    ]
-  },
-  "INVALID_TYPE": {
-    "message": [
-      "Argument `<arg_name>` should not be a <arg_type>."
-    ]
-  },
-  "INVALID_TYPENAME_CALL": {
-    "message": [
-      "StructField does not have typeName. Use typeName on its type explicitly instead."
-    ]
-  },
-  "INVALID_TYPE_DF_EQUALITY_ARG": {
-    "message": [
-      "Expected type <expected_type> for `<arg_name>` but got type <actual_type>."
-    ]
-  },
-  "INVALID_UDF_EVAL_TYPE": {
-    "message": [
-      "Eval type for UDF must be <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_BOTH_RETURN_TYPE_AND_ANALYZE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It has both its return type and an 'analyze' attribute. Please make it have one of either the return type or the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_EVAL_TYPE": {
-    "message": [
-      "The eval type for the UDTF '<name>' is invalid. It must be one of <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_HANDLER_TYPE": {
-    "message": [
-      "The UDTF is invalid. The function handler must be a class, but got '<type>'. Please provide a class as the function handler."
-    ]
-  },
-  "INVALID_UDTF_NO_EVAL": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not implement the required 'eval' method. Please implement the 'eval' method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_RETURN_TYPE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not specify its return type or implement the required 'analyze' static method. Please specify the return type or implement the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_WHEN_USAGE": {
-    "message": [
-      "when() can only be applied on a Column previously generated by when() function, and cannot be applied once otherwise() is applied."
-    ]
-  },
-  "INVALID_WINDOW_BOUND_TYPE": {
-    "message": [
-      "Invalid window bound type: <window_bound_type>."
-    ]
-  },
-  "JAVA_GATEWAY_EXITED": {
-    "message": [
-      "Java gateway process exited before sending its port number."
-    ]
-  },
-  "JVM_ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail."
-    ]
-  },
-  "KEY_NOT_EXISTS": {
-    "message": [
-      "Key `<key>` is not exists."
-    ]
-  },
-  "KEY_VALUE_PAIR_REQUIRED": {
-    "message": [
-      "Key-value pair or a list of pairs is required."
-    ]
-  },
-  "LENGTH_SHOULD_BE_THE_SAME": {
-    "message": [
-      "<arg1> and <arg2> should be of the same length, got <arg1_length> and <arg2_length>."
-    ]
-  },
-  "MASTER_URL_NOT_SET": {
-    "message": [
-      "A master URL must be set in your configuration."
-    ]
-  },
-  "MISSING_LIBRARY_FOR_PROFILER": {
-    "message": [
-      "Install the 'memory_profiler' library in the cluster to enable memory profiling."
-    ]
-  },
-  "MISSING_VALID_PLAN": {
-    "message": [
-      "Argument to <operator> does not contain a valid plan."
-    ]
-  },
-  "MIXED_TYPE_REPLACEMENT": {
-    "message": [
-      "Mixed type replacements are not supported."
-    ]
-  },
-  "NEGATIVE_VALUE": {
-    "message": [
-      "Value for `<arg_name>` must be greater than or equal to 0, got '<arg_value>'."
-    ]
-  },
-  "NOT_BOOL": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float or int, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_LIST_OR_NONE_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int, list, None, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or list, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or str, got <arg_type>."
-    ]
-  },
-  "NOT_CALLABLE": {
-    "message": [
-      "Argument `<arg_name>` should be a callable, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, str or DataType, but got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, float, integer, list or string, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or int, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int, list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a StructType, Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_DATAFRAME": {
-    "message": [
-      "Argument `<arg_name>` should be a DataFrame, got <arg_type>."
-    ]
-  },
-  "NOT_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a DataType or str, got <arg_type>."
-    ]
-  },
-  "NOT_DICT": {
-    "message": [
-      "Argument `<arg_name>` should be a dict, got <arg_type>."
-    ]
-  },
-  "NOT_EXPRESSION": {
-    "message": [
-      "Argument `<arg_name>` should be an Expression, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a float or int, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a float, int, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_IMPLEMENTED": {
-    "message": [
-      "<feature> is not implemented."
-    ]
-  },
-  "NOT_INT": {
-    "message": [
-      "Argument `<arg_name>` should be an int, got <arg_type>."
-    ]
-  },
-  "NOT_INT_OR_SLICE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an int, slice or str, got <arg_type>."
-    ]
-  },
-  "NOT_IN_BARRIER_STAGE": {
-    "message": [
-      "It is not in a barrier stage."
-    ]
-  },
-  "NOT_ITERABLE": {
-    "message": [
-      "<objectName> is not iterable."
-    ]
-  },
-  "NOT_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a list, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a list[float, int], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[str], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_NONE_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a list, None or StructType, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_NUMERIC_COLUMNS": {
-    "message": [
-      "Numeric aggregation function can only be applied on numeric columns, got <invalid_columns>."
-    ]
-  },
-  "NOT_OBSERVATION_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an Observation or str, got <arg_type>."
-    ]
-  },
-  "NOT_SAME_TYPE": {
-    "message": [
-      "Argument `<arg_name1>` and `<arg_name2>` should be the same type, got <arg_type1> and <arg_type2>."
-    ]
-  },
-  "NOT_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a str, got <arg_type>."
-    ]
-  },
-  "NOT_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a struct type, got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_LIST_OF_RDD": {
-    "message": [
-      "Argument `<arg_name>` should be a str or list[RDD], got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a str or struct type, got <arg_type>."
-    ]
-  },
-  "NOT_WINDOWSPEC": {
-    "message": [
-      "Argument `<arg_name>` should be a WindowSpec, got <arg_type>."
-    ]
-  },
-  "NO_ACTIVE_EXCEPTION": {
-    "message": [
-      "No active exception."
-    ]
-  },
-  "NO_ACTIVE_OR_DEFAULT_SESSION": {
-    "message": [
-      "No active or default Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_ACTIVE_SESSION": {
-    "message": [
-      "No active Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_OBSERVE_BEFORE_GET": {
-    "message": [
-      "Should observe by calling `DataFrame.observe` before `get`."
-    ]
-  },
-  "NO_SCHEMA_AND_DRIVER_DEFAULT_SCHEME": {
-    "message": [
-      "Only allows <arg_name> to be a path without scheme, and Spark Driver should use the default scheme to determine the destination file system."
-    ]
-  },
-  "ONLY_ALLOWED_FOR_SINGLE_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` can only be provided for a single column."
-    ]
-  },
-  "ONLY_ALLOW_SINGLE_TRIGGER": {
-    "message": [
-      "Only a single trigger is allowed."
-    ]
-  },
-  "ONLY_SUPPORTED_WITH_SPARK_CONNECT": {
-    "message": [
-      "<feature> is only supported with Spark Connect; however, the current Spark session does not use Spark Connect."
-    ]
-  },
-  "PACKAGE_NOT_INSTALLED": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, it was not found."
-    ]
-  },
-  "PIPE_FUNCTION_EXITED": {
-    "message": [
-      "Pipe function `<func_name>` exited with error code <error_code>."
-    ]
-  },
-  "PYTHON_HASH_SEED_NOT_SET": {
-    "message": [
-      "Randomness of hash of string should be disabled via PYTHONHASHSEED."
-    ]
-  },
-  "PYTHON_STREAMING_DATA_SOURCE_RUNTIME_ERROR": {
-    "message": [
-      "Failed when running Python streaming data source: <msg>"
-    ]
-  },
-  "PYTHON_VERSION_MISMATCH": {
-    "message": [
-      "Python in worker has different version: <worker_version> than that in driver: <driver_version>, PySpark cannot run with different minor versions.",
-      "Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set."
-    ]
-  },
-  "RDD_TRANSFORM_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to broadcast an RDD or reference an RDD from an ",
-      "action or transformation. RDD transformations and actions can only be invoked by the ",
-      "driver, not inside of other transformations; for example, ",
-      "rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values ",
-      "transformation and count action cannot be performed inside of the rdd1.map ",
-      "transformation. For more information, see SPARK-5063."
-    ]
-  },
-  "READ_ONLY": {
-    "message": [
-      "<object> is read-only."
-    ]
-  },
-  "RESPONSE_ALREADY_RECEIVED": {
-    "message": [
-      "OPERATION_NOT_FOUND on the server but responses were already received from it."
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Column names of the returned pyarrow.Table do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Column names of the returned pandas.DataFrame do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: <expected> Actual: <actual>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was <output_length> and the length of input was <input_length>."
-    ]
-  },
-  "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Columns do not match in their data type: <mismatch>."
-    ]
-  },
-  "RETRIES_EXCEEDED": {
-    "message": [
-      "The maximum number of retries has been exceeded."
-    ]
-  },
-  "REUSE_OBSERVATION": {
-    "message": [
-      "An Observation can be used with a DataFrame only once."
-    ]
-  },
-  "SCHEMA_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Result vector from pandas_udf was not the required length: expected <expected>, got <actual>."
-    ]
-  },
-  "SESSION_ALREADY_EXIST": {
-    "message": [
-      "Cannot start a remote Spark session because there is a regular Spark session already running."
-    ]
-  },
-  "SESSION_NEED_CONN_STR_OR_BUILDER": {
-    "message": [
-      "Needs either connection string or channelBuilder (mutually exclusive) to create a new SparkSession."
-    ]
-  },
-  "SESSION_NOT_SAME": {
-    "message": [
-      "Both Datasets must belong to the same SparkSession."
-    ]
-  },
-  "SESSION_OR_CONTEXT_EXISTS": {
-    "message": [
-      "There should not be an existing Spark Session or Spark Context."
-    ]
-  },
-  "SESSION_OR_CONTEXT_NOT_EXISTS": {
-    "message": [
-      "SparkContext or SparkSession should be created first."
-    ]
-  },
-  "SLICE_WITH_STEP": {
-    "message": [
-      "Slice with step is not supported."
-    ]
-  },
-  "STATE_NOT_EXISTS": {
-    "message": [
-      "State is either not defined or has already been removed."
-    ]
-  },
-  "STOP_ITERATION_OCCURRED": {
-    "message": [
-      "Caught StopIteration thrown from user's code; failing the task: <exc>"
-    ]
-  },
-  "STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "pandas iterator UDF should exhaust the input iterator."
-    ]
-  },
-  "STREAMING_CONNECT_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the function `<name>`. If you accessed the Spark session, or a DataFrame defined outside of the function, or any object that contains a Spark session, please be aware that they are not allowed in Spark Connect. For `foreachBatch`, please access the Spark session using `df.sparkSession`, where `df` is the first parameter in your `foreachBatch` function. For `StreamingQueryListener`, please access the Spark session using `self.spark`. For details please check out the PySpark doc for `foreachBatch` and `StreamingQueryListener`."
-    ]
-  },
-  "TEST_CLASS_NOT_COMPILED": {
-    "message": [
-      "<test_class_path> doesn't exist. Spark sql test classes are not compiled."
-    ]
-  },
-  "TOO_MANY_VALUES": {
-    "message": [
-      "Expected <expected> values for `<item>`, got <actual>."
-    ]
-  },
-  "TYPE_HINT_SHOULD_BE_SPECIFIED": {
-    "message": [
-      "Type hints for <target> should be specified; however, got <sig>."
-    ]
-  },
-  "UDF_RETURN_TYPE": {
-    "message": [
-      "Return type of the user-defined function should be <expected>, but is <actual>."
-    ]
-  },
-  "UDTF_ARROW_TYPE_CAST_ERROR": {
-    "message": [
-      "Cannot convert the output value of the column '<col_name>' with type '<col_type>' to the specified return type of the column: '<arrow_type>'. Please check if the data types match and try again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_IMPLEMENTS_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function implements the 'analyze' method, but its constructor has more than two arguments (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, or one 'self' argument plus another argument for the result of the 'analyze' method, and try the query again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function does not implement the 'analyze' method, and its constructor has more than one argument (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, and try the query again."
-    ]
-  },
-  "UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because the function arguments did not match the expected signature of the 'eval' method (<reason>). Please update the query so that this table function call provides arguments matching the expected signature, or else update the table function so that its 'eval' method accepts the provided arguments, and then try the query again."
-    ]
-  },
-  "UDTF_EXEC_ERROR": {
-    "message": [
-      "User defined table function encountered an error in the '<method_name>' method: <error>"
-    ]
-  },
-  "UDTF_INVALID_OUTPUT_ROW_TYPE": {
-    "message": [
-      "The type of an individual output row in the '<func>' method of the UDTF is invalid. Each row should be a tuple, list, or dict, but got '<type>'. Please make sure that the output rows are of the correct type."
-    ]
-  },
-  "UDTF_RETURN_NOT_ITERABLE": {
-    "message": [
-      "The return value of the '<func>' method of the UDTF is invalid. It should be an iterable (e.g., generator or list), but got '<type>'. Please make sure that the UDTF returns one of these types."
-    ]
-  },
-  "UDTF_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "The number of columns in the result does not match the specified schema. Expected column count: <expected>, Actual column count: <actual>. Please make sure the values returned by the '<func>' method have the same number of columns as specified in the output schema."
-    ]
-  },
-  "UDTF_RETURN_TYPE_MISMATCH": {
-    "message": [
-      "Mismatch in return type for the UDTF '<name>'. Expected a 'StructType', but got '<return_type>'. Please ensure the return type is a correctly formatted StructType."
-    ]
-  },
-  "UDTF_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the UDTF '<name>': <message>"
-    ]
-  },
-  "UNEXPECTED_RESPONSE_FROM_SERVER": {
-    "message": [
-      "Unexpected response from iterator server."
-    ]
-  },
-  "UNEXPECTED_TUPLE_WITH_STRUCT": {
-    "message": [
-      "Unexpected tuple <tuple> with StructType."
-    ]
-  },
-  "UNKNOWN_EXPLAIN_MODE": {
-    "message": [
-      "Unknown explain mode: '<explain_mode>'. Accepted explain modes are 'simple', 'extended', 'codegen', 'cost', 'formatted'."
-    ]
-  },
-  "UNKNOWN_INTERRUPT_TYPE": {
-    "message": [
-      "Unknown interrupt type: '<interrupt_type>'. Accepted interrupt types are 'all'."
-    ]
-  },
-  "UNKNOWN_RESPONSE": {
-    "message": [
-      "Unknown response: <response>."
-    ]
-  },
-  "UNKNOWN_VALUE_FOR": {
-    "message": [
-      "Unknown value for `<var>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE": {
-    "message": [
-      "Unsupported DataType `<data_type>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW": {
-    "message": [
-      "Single data type <data_type> is not supported with Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION": {
-    "message": [
-      "<data_type> is not supported in conversion to Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_VERSION": {
-    "message": [
-      "<data_type> is only supported with pyarrow 2.0.0 and above."
-    ]
-  },
-  "UNSUPPORTED_JOIN_TYPE": {
-    "message": [
-      "Unsupported join type: <join_type>. Supported join types include: 'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'semi', 'leftanti', 'left_anti', 'anti', 'cross'."
-    ]
-  },
-  "UNSUPPORTED_LITERAL": {
-    "message": [
-      "Unsupported Literal '<literal>'."
-    ]
-  },
-  "UNSUPPORTED_LOCAL_CONNECTION_STRING": {
-    "message": [
-      "Creating new SparkSessions with `local` connection string is not supported."
-    ]
-  },
-  "UNSUPPORTED_NUMPY_ARRAY_SCALAR": {
-    "message": [
-      "The type of array scalar '<dtype>' is not supported."
-    ]
-  },
-  "UNSUPPORTED_OPERATION": {
-    "message": [
-      "<operation> is not supported."
-    ]
-  },
-  "UNSUPPORTED_PACKAGE_VERSION": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, your version is <current_version>."
-    ]
-  },
-  "UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should use only POSITIONAL or POSITIONAL OR KEYWORD arguments."
-    ]
-  },
-  "UNSUPPORTED_SIGNATURE": {
-    "message": [
-      "Unsupported signature: <signature>."
-    ]
-  },
-  "UNSUPPORTED_WITH_ARROW_OPTIMIZATION": {
-    "message": [
-      "<feature> is not supported with Arrow optimization enabled in Python UDFs. Disable 'spark.sql.execution.pythonUDF.arrow.enabled' to workaround."
-    ]
-  },
-  "VALUE_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` does not allow <disallowed_value>."
-    ]
-  },
-  "VALUE_NOT_ACCESSIBLE": {
-    "message": [
-      "Value `<value>` cannot be accessed inside tasks."
-    ]
-  },
-  "VALUE_NOT_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` has to be amongst the following values: <allowed_values>."
-    ]
-  },
-  "VALUE_NOT_ANY_OR_ALL": {
-    "message": [
-      "Value for `<arg_name>` must be 'any' or 'all', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_BETWEEN": {
-    "message": [
-      "Value for `<arg_name>` must be between <min> and <max>."
-    ]
-  },
-  "VALUE_NOT_NON_EMPTY_STR": {
-    "message": [
-      "Value for `<arg_name>` must be a non-empty string, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PEARSON": {
-    "message": [
-      "Value for `<arg_name>` only supports the 'pearson', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PLAIN_COLUMN_REFERENCE": {
-    "message": [
-      "Value `<val>` in `<field_name>` should be a plain column reference such as `df.col` or `col('column')`."
-    ]
-  },
-  "VALUE_NOT_POSITIVE": {
-    "message": [
-      "Value for `<arg_name>` must be positive, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_TRUE": {
-    "message": [
-      "Value for `<arg_name>` must be True, got '<arg_value>'."
-    ]
-  },
-  "VALUE_OUT_OF_BOUNDS": {
-    "message": [
-      "Value for `<arg_name>` must be between <lower_bound> and <upper_bound> (inclusive), got <actual>"
-    ]
-  },
-  "WRONG_NUM_ARGS_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should take between 1 and 3 arguments, but the provided function takes <num_args>."
-    ]
-  },
-  "WRONG_NUM_COLUMNS": {
-    "message": [
-      "Function `<func_name>` should take at least <num_cols> columns."
-    ]
-  },
-  "ZERO_INDEX": {
-    "message": [
-      "Index must be non-zero."
-    ]
-  }
-}
-'''
-
+# Note: Though we call them "error classes" here, the proper name is "error conditions",
+#   hence why the name of the JSON file different.
+#   For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+#   This discrepancy will be resolved as part of: https://issues.apache.org/jira/browse/SPARK-47429
+# Note: When we drop support for Python 3.8, we should migrate from importlib.resources.read_text()

Review Comment:
   Updated. I also reran test 5 from the PR description.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1587017977


##########
python/MANIFEST.in:
##########
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   I would appreciate if you make another PR :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1469003292


##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1110 +15,16 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+from pathlib import Path
 
+THIS_DIR = Path(__file__).parent
+# Note that though we call them "error classes" here, the proper name is "error conditions",
+# hence why the name of the JSON file different.
+# For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+ERROR_CONDITIONS_PATH = THIS_DIR / "error-conditions.json"

Review Comment:
   This comment is related to the work being done in #44902, by the way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1469000326


##########
python/pyspark/errors/error-conditions.json:
##########
@@ -0,0 +1,1096 @@
+{

Review Comment:
   Confirmed. I've updated the PR description accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1470602522


##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1110 +15,14 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-    "message": [
-      "An application name must be set in your configuration."
-    ]
-  },
-  "ARGUMENT_REQUIRED": {
-    "message": [
-      "Argument `<arg_name>` is required when <condition>."
-    ]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-    "message": [
-      "Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT."
-    ]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-    "message": [
-      "Attribute `<attr_name>` in provided object `<obj_name>` is not callable."
-    ]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported."
-    ]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-    "message": [
-      "Length mismatch: Expected axis has <expected_length> element, new values have <actual_length> elements."
-    ]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-    "message": [
-      "Broadcast variable `<variable>` not loaded."
-    ]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-    "message": [
-      "Not supported to call `<func_name>` before initialize <object>."
-    ]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-    "message": [
-      "`<data_type>` can not accept object `<obj_name>` in type `<obj_type>`."
-    ]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-    "message": [
-      "Dunder(double underscore) attribute is for internal use only."
-    ]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-    "message": [
-      "Cannot apply 'in' operator against a column: please use 'contains' in a string column or 'array_contains' function for an array column."
-    ]
-  },
-  "CANNOT_BE_EMPTY": {
-    "message": [
-      "At least one <item> must be specified."
-    ]
-  },
-  "CANNOT_BE_NONE": {
-    "message": [
-      "Argument `<arg_name>` cannot be None."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-    "message": [
-      "Spark Connect server cannot be configured: Existing [<existing_url>], New [<new_url>]."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-    "message": [
-      "Spark Connect server and Spark master cannot be configured together: Spark master [<master_url>], Spark Connect [<connect_url>]."
-    ]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-    "message": [
-      "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions."
-    ]
-  },
-  "CANNOT_CONVERT_TYPE": {
-    "message": [
-      "Cannot convert <from_type> into <to_type>."
-    ]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-    "message": [
-      "Some of types cannot be determined after inferring."
-    ]
-  },
-  "CANNOT_GET_BATCH_ID": {
-    "message": [
-      "Could not get batch id from <obj_name>."
-    ]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-    "message": [
-      "Can not infer Array Type from a list with None as the first element."
-    ]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-    "message": [
-      "Can not infer schema from an empty dataset."
-    ]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-    "message": [
-      "Can not infer schema for type: `<data_type>`."
-    ]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-    "message": [
-      "Unable to infer the type of the field `<field_name>`."
-    ]
-  },
-  "CANNOT_MERGE_TYPE": {
-    "message": [
-      "Can not merge type `<data_type1>` and `<data_type2>`."
-    ]
-  },
-  "CANNOT_OPEN_SOCKET": {
-    "message": [
-      "Can not open socket: <errors>."
-    ]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-    "message": [
-      "Unable to parse datatype. <msg>."
-    ]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-    "message": [
-      "Metadata can only be provided for a single column."
-    ]
-  },
-  "CANNOT_SET_TOGETHER": {
-    "message": [
-      "<arg_list> should not be set together."
-    ]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-    "message": [
-      "returnType can not be specified when `<arg_name>` is a user-defined function, but got <return_type>."
-    ]
-  },
-  "CANNOT_WITHOUT": {
-    "message": [
-      "Cannot <condition1> without <condition2>."
-    ]
-  },
-  "COLUMN_IN_LIST": {
-    "message": [
-      "`<func_name>` does not allow a Column in a list."
-    ]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-    "message": [
-      "Only one Spark Connect client URL can be set; however, got a different URL [<new_url>] from the existing [<existing_url>]."
-    ]
-  },
-  "CONNECT_URL_NOT_SET": {
-    "message": [
-      "Cannot create a Spark Connect session because the Spark Connect remote URL has not been set. Please define the remote URL by setting either the 'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-    ]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."
-    ]
-  },
-  "CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT": {
-    "message": [
-      "Remote client cannot create a SparkContext. Create SparkSession instead."
-    ]
-  },
-  "DATA_SOURCE_INVALID_RETURN_TYPE": {
-    "message": [
-      "Unsupported return type ('<type>') from Python data source '<name>'. Expected types: <supported_types>."
-    ]
-  },
-  "DATA_SOURCE_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "Return schema mismatch in the result from 'read' method. Expected: <expected> columns, Found: <actual> columns. Make sure the returned values match the required output schema."
-    ]
-  },
-  "DATA_SOURCE_TYPE_MISMATCH": {
-    "message": [
-      "Expected <expected>, but got <actual>."
-    ]
-  },
-  "DIFFERENT_PANDAS_DATAFRAME": {
-    "message": [
-      "DataFrames are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_INDEX": {
-    "message": [
-      "Indices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_MULTIINDEX": {
-    "message": [
-      "MultiIndices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_SERIES": {
-    "message": [
-      "Series are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_ROWS": {
-    "message": [
-      "<error_msg>"
-    ]
-  },
-  "DIFFERENT_SCHEMA": {
-    "message": [
-      "Schemas do not match.",
-      "--- actual",
-      "+++ expected",
-      "<error_msg>"
-    ]
-  },
-  "DISALLOWED_TYPE_FOR_CONTAINER": {
-    "message": [
-      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <item_type>"
-    ]
-  },
-  "DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT": {
-    "message": [
-      "Duplicated field names in Arrow Struct are not allowed, got <field_names>"
-    ]
-  },
-  "ERROR_OCCURRED_WHILE_CALLING": {
-    "message": [
-      "An error occurred while calling <func_name>: <error_msg>."
-    ]
-  },
-  "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN": {
-    "message": [
-      "Function `<func_name>` should return Column, got <return_type>."
-    ]
-  },
-  "INCORRECT_CONF_FOR_PROFILE": {
-    "message": [
-      "`spark.python.profile` or `spark.python.profile.memory` configuration",
-      " must be set to `true` to enable Python profile."
-    ]
-  },
-  "INDEX_NOT_POSITIVE": {
-    "message": [
-      "Index must be positive, got '<index>'."
-    ]
-  },
-  "INDEX_OUT_OF_RANGE": {
-    "message": [
-      "<arg_name> index out of range, got '<index>'."
-    ]
-  },
-  "INVALID_ARROW_UDTF_RETURN_TYPE": {
-    "message": [
-      "The return type of the arrow-optimized Python UDTF should be of type 'pandas.DataFrame', but the '<func>' method returned a value of type <return_type> with value: <value>."
-    ]
-  },
-  "INVALID_BROADCAST_OPERATION": {
-    "message": [
-      "Broadcast can only be <operation> in driver."
-    ]
-  },
-  "INVALID_CALL_ON_UNRESOLVED_OBJECT": {
-    "message": [
-      "Invalid call to `<func_name>` on unresolved object."
-    ]
-  },
-  "INVALID_CONNECT_URL": {
-    "message": [
-      "Invalid URL for Spark Connect: <detail>"
-    ]
-  },
-  "INVALID_INTERVAL_CASTING": {
-    "message": [
-      "Interval <start_field> to <end_field> is invalid."
-    ]
-  },
-  "INVALID_ITEM_FOR_CONTAINER": {
-    "message": [
-      "All items in `<arg_name>` should be in <allowed_types>, got <item_type>."
-    ]
-  },
-  "INVALID_MULTIPLE_ARGUMENT_CONDITIONS": {
-    "message": [
-      "[{arg_names}] cannot be <condition>."
-    ]
-  },
-  "INVALID_NDARRAY_DIMENSION": {
-    "message": [
-      "NumPy array input should be of <dimensions> dimensions."
-    ]
-  },
-  "INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP": {
-    "message": [
-      "Invalid number of dataframes in group <dataframes_in_group>."
-    ]
-  },
-  "INVALID_PANDAS_UDF": {
-    "message": [
-      "Invalid function: <detail>"
-    ]
-  },
-  "INVALID_PANDAS_UDF_TYPE": {
-    "message": [
-      "`<arg_name>` should be one of the values from PandasUDFType, got <arg_type>"
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_ARROW_UDF": {
-    "message": [
-      "Grouped and Cogrouped map Arrow UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_PANDAS_UDF": {
-    "message": [
-      "Pandas UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_SESSION_UUID_ID": {
-    "message": [
-      "Parameter value <arg_name> must be a valid UUID format: <origin>"
-    ]
-  },
-  "INVALID_TIMEOUT_TIMESTAMP": {
-    "message": [
-      "Timeout timestamp (<timestamp>) cannot be earlier than the current watermark (<watermark>)."
-    ]
-  },
-  "INVALID_TYPE": {
-    "message": [
-      "Argument `<arg_name>` should not be a <arg_type>."
-    ]
-  },
-  "INVALID_TYPENAME_CALL": {
-    "message": [
-      "StructField does not have typeName. Use typeName on its type explicitly instead."
-    ]
-  },
-  "INVALID_TYPE_DF_EQUALITY_ARG": {
-    "message": [
-      "Expected type <expected_type> for `<arg_name>` but got type <actual_type>."
-    ]
-  },
-  "INVALID_UDF_EVAL_TYPE": {
-    "message": [
-      "Eval type for UDF must be <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_BOTH_RETURN_TYPE_AND_ANALYZE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It has both its return type and an 'analyze' attribute. Please make it have one of either the return type or the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_EVAL_TYPE": {
-    "message": [
-      "The eval type for the UDTF '<name>' is invalid. It must be one of <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_HANDLER_TYPE": {
-    "message": [
-      "The UDTF is invalid. The function handler must be a class, but got '<type>'. Please provide a class as the function handler."
-    ]
-  },
-  "INVALID_UDTF_NO_EVAL": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not implement the required 'eval' method. Please implement the 'eval' method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_RETURN_TYPE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not specify its return type or implement the required 'analyze' static method. Please specify the return type or implement the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_WHEN_USAGE": {
-    "message": [
-      "when() can only be applied on a Column previously generated by when() function, and cannot be applied once otherwise() is applied."
-    ]
-  },
-  "INVALID_WINDOW_BOUND_TYPE": {
-    "message": [
-      "Invalid window bound type: <window_bound_type>."
-    ]
-  },
-  "JAVA_GATEWAY_EXITED": {
-    "message": [
-      "Java gateway process exited before sending its port number."
-    ]
-  },
-  "JVM_ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail."
-    ]
-  },
-  "KEY_NOT_EXISTS": {
-    "message": [
-      "Key `<key>` is not exists."
-    ]
-  },
-  "KEY_VALUE_PAIR_REQUIRED": {
-    "message": [
-      "Key-value pair or a list of pairs is required."
-    ]
-  },
-  "LENGTH_SHOULD_BE_THE_SAME": {
-    "message": [
-      "<arg1> and <arg2> should be of the same length, got <arg1_length> and <arg2_length>."
-    ]
-  },
-  "MASTER_URL_NOT_SET": {
-    "message": [
-      "A master URL must be set in your configuration."
-    ]
-  },
-  "MISSING_LIBRARY_FOR_PROFILER": {
-    "message": [
-      "Install the 'memory_profiler' library in the cluster to enable memory profiling."
-    ]
-  },
-  "MISSING_VALID_PLAN": {
-    "message": [
-      "Argument to <operator> does not contain a valid plan."
-    ]
-  },
-  "MIXED_TYPE_REPLACEMENT": {
-    "message": [
-      "Mixed type replacements are not supported."
-    ]
-  },
-  "NEGATIVE_VALUE": {
-    "message": [
-      "Value for `<arg_name>` must be greater than or equal to 0, got '<arg_value>'."
-    ]
-  },
-  "NOT_BOOL": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float or int, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_LIST_OR_NONE_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int, list, None, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or list, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or str, got <arg_type>."
-    ]
-  },
-  "NOT_CALLABLE": {
-    "message": [
-      "Argument `<arg_name>` should be a callable, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, str or DataType, but got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, float, integer, list or string, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or int, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int, list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a StructType, Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_DATAFRAME": {
-    "message": [
-      "Argument `<arg_name>` should be a DataFrame, got <arg_type>."
-    ]
-  },
-  "NOT_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a DataType or str, got <arg_type>."
-    ]
-  },
-  "NOT_DICT": {
-    "message": [
-      "Argument `<arg_name>` should be a dict, got <arg_type>."
-    ]
-  },
-  "NOT_EXPRESSION": {
-    "message": [
-      "Argument `<arg_name>` should be an Expression, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a float or int, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a float, int, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_IMPLEMENTED": {
-    "message": [
-      "<feature> is not implemented."
-    ]
-  },
-  "NOT_INSTANCE_OF": {
-    "message": [
-      "<value> is not an instance of type <type>."
-    ]
-  },
-  "NOT_INT": {
-    "message": [
-      "Argument `<arg_name>` should be an int, got <arg_type>."
-    ]
-  },
-  "NOT_INT_OR_SLICE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an int, slice or str, got <arg_type>."
-    ]
-  },
-  "NOT_IN_BARRIER_STAGE": {
-    "message": [
-      "It is not in a barrier stage."
-    ]
-  },
-  "NOT_ITERABLE": {
-    "message": [
-      "<objectName> is not iterable."
-    ]
-  },
-  "NOT_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a list, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a list[float, int], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[str], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_NONE_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a list, None or StructType, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_NUMERIC_COLUMNS": {
-    "message": [
-      "Numeric aggregation function can only be applied on numeric columns, got <invalid_columns>."
-    ]
-  },
-  "NOT_OBSERVATION_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an Observation or str, got <arg_type>."
-    ]
-  },
-  "NOT_SAME_TYPE": {
-    "message": [
-      "Argument `<arg_name1>` and `<arg_name2>` should be the same type, got <arg_type1> and <arg_type2>."
-    ]
-  },
-  "NOT_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a str, got <arg_type>."
-    ]
-  },
-  "NOT_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a struct type, got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_LIST_OF_RDD": {
-    "message": [
-      "Argument `<arg_name>` should be a str or list[RDD], got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a str or struct type, got <arg_type>."
-    ]
-  },
-  "NOT_WINDOWSPEC": {
-    "message": [
-      "Argument `<arg_name>` should be a WindowSpec, got <arg_type>."
-    ]
-  },
-  "NO_ACTIVE_EXCEPTION": {
-    "message": [
-      "No active exception."
-    ]
-  },
-  "NO_ACTIVE_OR_DEFAULT_SESSION": {
-    "message": [
-      "No active or default Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_ACTIVE_SESSION": {
-    "message": [
-      "No active Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_OBSERVE_BEFORE_GET": {
-    "message": [
-      "Should observe by calling `DataFrame.observe` before `get`."
-    ]
-  },
-  "NO_SCHEMA_AND_DRIVER_DEFAULT_SCHEME": {
-    "message": [
-      "Only allows <arg_name> to be a path without scheme, and Spark Driver should use the default scheme to determine the destination file system."
-    ]
-  },
-  "ONLY_ALLOWED_FOR_SINGLE_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` can only be provided for a single column."
-    ]
-  },
-  "ONLY_ALLOW_SINGLE_TRIGGER": {
-    "message": [
-      "Only a single trigger is allowed."
-    ]
-  },
-  "ONLY_SUPPORTED_WITH_SPARK_CONNECT": {
-    "message": [
-      "<feature> is only supported with Spark Connect; however, the current Spark session does not use Spark Connect."
-    ]
-  },
-  "PACKAGE_NOT_INSTALLED": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, it was not found."
-    ]
-  },
-  "PIPE_FUNCTION_EXITED": {
-    "message": [
-      "Pipe function `<func_name>` exited with error code <error_code>."
-    ]
-  },
-  "PYTHON_HASH_SEED_NOT_SET": {
-    "message": [
-      "Randomness of hash of string should be disabled via PYTHONHASHSEED."
-    ]
-  },
-  "PYTHON_VERSION_MISMATCH": {
-    "message": [
-      "Python in worker has different version: <worker_version> than that in driver: <driver_version>, PySpark cannot run with different minor versions.",
-      "Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set."
-    ]
-  },
-  "RDD_TRANSFORM_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to broadcast an RDD or reference an RDD from an ",
-      "action or transformation. RDD transformations and actions can only be invoked by the ",
-      "driver, not inside of other transformations; for example, ",
-      "rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values ",
-      "transformation and count action cannot be performed inside of the rdd1.map ",
-      "transformation. For more information, see SPARK-5063."
-    ]
-  },
-  "READ_ONLY": {
-    "message": [
-      "<object> is read-only."
-    ]
-  },
-  "RESPONSE_ALREADY_RECEIVED": {
-    "message": [
-      "OPERATION_NOT_FOUND on the server but responses were already received from it."
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Column names of the returned pyarrow.Table do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Column names of the returned pandas.DataFrame do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: <expected> Actual: <actual>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was <output_length> and the length of input was <input_length>."
-    ]
-  },
-  "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Columns do not match in their data type: <mismatch>."
-    ]
-  },
-  "RETRIES_EXCEEDED": {
-    "message": [
-      "The maximum number of retries has been exceeded."
-    ]
-  },
-  "REUSE_OBSERVATION": {
-    "message": [
-      "An Observation can be used with a DataFrame only once."
-    ]
-  },
-  "SCHEMA_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Result vector from pandas_udf was not the required length: expected <expected>, got <actual>."
-    ]
-  },
-  "SESSION_ALREADY_EXIST": {
-    "message": [
-      "Cannot start a remote Spark session because there is a regular Spark session already running."
-    ]
-  },
-  "SESSION_NEED_CONN_STR_OR_BUILDER": {
-    "message": [
-      "Needs either connection string or channelBuilder (mutually exclusive) to create a new SparkSession."
-    ]
-  },
-  "SESSION_NOT_SAME": {
-    "message": [
-      "Both Datasets must belong to the same SparkSession."
-    ]
-  },
-  "SESSION_OR_CONTEXT_EXISTS": {
-    "message": [
-      "There should not be an existing Spark Session or Spark Context."
-    ]
-  },
-  "SESSION_OR_CONTEXT_NOT_EXISTS": {
-    "message": [
-      "SparkContext or SparkSession should be created first."
-    ]
-  },
-  "SLICE_WITH_STEP": {
-    "message": [
-      "Slice with step is not supported."
-    ]
-  },
-  "STATE_NOT_EXISTS": {
-    "message": [
-      "State is either not defined or has already been removed."
-    ]
-  },
-  "STOP_ITERATION_OCCURRED": {
-    "message": [
-      "Caught StopIteration thrown from user's code; failing the task: <exc>"
-    ]
-  },
-  "STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "pandas iterator UDF should exhaust the input iterator."
-    ]
-  },
-  "STREAMING_CONNECT_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the function `<name>`. If you accessed the Spark session, or a DataFrame defined outside of the function, or any object that contains a Spark session, please be aware that they are not allowed in Spark Connect. For `foreachBatch`, please access the Spark session using `df.sparkSession`, where `df` is the first parameter in your `foreachBatch` function. For `StreamingQueryListener`, please access the Spark session using `self.spark`. For details please check out the PySpark doc for `foreachBatch` and `StreamingQueryListener`."
-    ]
-  },
-  "TEST_CLASS_NOT_COMPILED": {
-    "message": [
-      "<test_class_path> doesn't exist. Spark sql test classes are not compiled."
-    ]
-  },
-  "TOO_MANY_VALUES": {
-    "message": [
-      "Expected <expected> values for `<item>`, got <actual>."
-    ]
-  },
-  "TYPE_HINT_SHOULD_BE_SPECIFIED": {
-    "message": [
-      "Type hints for <target> should be specified; however, got <sig>."
-    ]
-  },
-  "UDF_RETURN_TYPE": {
-    "message": [
-      "Return type of the user-defined function should be <expected>, but is <actual>."
-    ]
-  },
-  "UDTF_ARROW_TYPE_CAST_ERROR": {
-    "message": [
-      "Cannot convert the output value of the column '<col_name>' with type '<col_type>' to the specified return type of the column: '<arrow_type>'. Please check if the data types match and try again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_IMPLEMENTS_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function implements the 'analyze' method, but its constructor has more than two arguments (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, or one 'self' argument plus another argument for the result of the 'analyze' method, and try the query again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function does not implement the 'analyze' method, and its constructor has more than one argument (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, and try the query again."
-    ]
-  },
-  "UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because the function arguments did not match the expected signature of the 'eval' method (<reason>). Please update the query so that this table function call provides arguments matching the expected signature, or else update the table function so that its 'eval' method accepts the provided arguments, and then try the query again."
-    ]
-  },
-  "UDTF_EXEC_ERROR": {
-    "message": [
-      "User defined table function encountered an error in the '<method_name>' method: <error>"
-    ]
-  },
-  "UDTF_INVALID_OUTPUT_ROW_TYPE": {
-    "message": [
-      "The type of an individual output row in the '<func>' method of the UDTF is invalid. Each row should be a tuple, list, or dict, but got '<type>'. Please make sure that the output rows are of the correct type."
-    ]
-  },
-  "UDTF_RETURN_NOT_ITERABLE": {
-    "message": [
-      "The return value of the '<func>' method of the UDTF is invalid. It should be an iterable (e.g., generator or list), but got '<type>'. Please make sure that the UDTF returns one of these types."
-    ]
-  },
-  "UDTF_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "The number of columns in the result does not match the specified schema. Expected column count: <expected>, Actual column count: <actual>. Please make sure the values returned by the '<func>' method have the same number of columns as specified in the output schema."
-    ]
-  },
-  "UDTF_RETURN_TYPE_MISMATCH": {
-    "message": [
-      "Mismatch in return type for the UDTF '<name>'. Expected a 'StructType', but got '<return_type>'. Please ensure the return type is a correctly formatted StructType."
-    ]
-  },
-  "UDTF_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the UDTF '<name>': <message>"
-    ]
-  },
-  "UNEXPECTED_RESPONSE_FROM_SERVER": {
-    "message": [
-      "Unexpected response from iterator server."
-    ]
-  },
-  "UNEXPECTED_TUPLE_WITH_STRUCT": {
-    "message": [
-      "Unexpected tuple <tuple> with StructType."
-    ]
-  },
-  "UNKNOWN_EXPLAIN_MODE": {
-    "message": [
-      "Unknown explain mode: '<explain_mode>'. Accepted explain modes are 'simple', 'extended', 'codegen', 'cost', 'formatted'."
-    ]
-  },
-  "UNKNOWN_INTERRUPT_TYPE": {
-    "message": [
-      "Unknown interrupt type: '<interrupt_type>'. Accepted interrupt types are 'all'."
-    ]
-  },
-  "UNKNOWN_RESPONSE": {
-    "message": [
-      "Unknown response: <response>."
-    ]
-  },
-  "UNKNOWN_VALUE_FOR": {
-    "message": [
-      "Unknown value for `<var>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE": {
-    "message": [
-      "Unsupported DataType `<data_type>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW": {
-    "message": [
-      "Single data type <data_type> is not supported with Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION": {
-    "message": [
-      "<data_type> is not supported in conversion to Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_VERSION": {
-    "message": [
-      "<data_type> is only supported with pyarrow 2.0.0 and above."
-    ]
-  },
-  "UNSUPPORTED_JOIN_TYPE": {
-    "message": [
-      "Unsupported join type: <join_type>. Supported join types include: 'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'semi', 'leftanti', 'left_anti', 'anti', 'cross'."
-    ]
-  },
-  "UNSUPPORTED_LITERAL": {
-    "message": [
-      "Unsupported Literal '<literal>'."
-    ]
-  },
-  "UNSUPPORTED_LOCAL_CONNECTION_STRING": {
-    "message": [
-      "Creating new SparkSessions with `local` connection string is not supported."
-    ]
-  },
-  "UNSUPPORTED_NUMPY_ARRAY_SCALAR": {
-    "message": [
-      "The type of array scalar '<dtype>' is not supported."
-    ]
-  },
-  "UNSUPPORTED_OPERATION": {
-    "message": [
-      "<operation> is not supported."
-    ]
-  },
-  "UNSUPPORTED_PACKAGE_VERSION": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, your version is <current_version>."
-    ]
-  },
-  "UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should use only POSITIONAL or POSITIONAL OR KEYWORD arguments."
-    ]
-  },
-  "UNSUPPORTED_SIGNATURE": {
-    "message": [
-      "Unsupported signature: <signature>."
-    ]
-  },
-  "UNSUPPORTED_WITH_ARROW_OPTIMIZATION": {
-    "message": [
-      "<feature> is not supported with Arrow optimization enabled in Python UDFs. Disable 'spark.sql.execution.pythonUDF.arrow.enabled' to workaround."
-    ]
-  },
-  "VALUE_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` does not allow <disallowed_value>."
-    ]
-  },
-  "VALUE_NOT_ACCESSIBLE": {
-    "message": [
-      "Value `<value>` cannot be accessed inside tasks."
-    ]
-  },
-  "VALUE_NOT_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` has to be amongst the following values: <allowed_values>."
-    ]
-  },
-  "VALUE_NOT_ANY_OR_ALL": {
-    "message": [
-      "Value for `<arg_name>` must be 'any' or 'all', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_BETWEEN": {
-    "message": [
-      "Value for `<arg_name>` must be between <min> and <max>."
-    ]
-  },
-  "VALUE_NOT_NON_EMPTY_STR": {
-    "message": [
-      "Value for `<arg_name>` must be a non-empty string, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PEARSON": {
-    "message": [
-      "Value for `<arg_name>` only supports the 'pearson', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PLAIN_COLUMN_REFERENCE": {
-    "message": [
-      "Value `<val>` in `<field_name>` should be a plain column reference such as `df.col` or `col('column')`."
-    ]
-  },
-  "VALUE_NOT_POSITIVE": {
-    "message": [
-      "Value for `<arg_name>` must be positive, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_TRUE": {
-    "message": [
-      "Value for `<arg_name>` must be True, got '<arg_value>'."
-    ]
-  },
-  "VALUE_OUT_OF_BOUND": {
-    "message": [
-      "Value for `<arg_name>` must be greater than <lower_bound> or less than <upper_bound>, got <actual>"
-    ]
-  },
-  "WRONG_NUM_ARGS_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should take between 1 and 3 arguments, but the provided function takes <num_args>."
-    ]
-  },
-  "WRONG_NUM_COLUMNS": {
-    "message": [
-      "Function `<func_name>` should take at least <num_cols> columns."
-    ]
-  }
-}
-'''
-
+# Note: Though we call them "error classes" here, the proper name is "error conditions",
+#   hence why the name of the JSON file different.
+#   For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+# Note: When we drop support for Python 3.8, we should migrate from importlib.resources.read_text()
+#   to importlib.resources.files().joinpath().read_text().
+#   See: https://docs.python.org/3/library/importlib.resources.html#importlib.resources.open_text
+ERROR_CLASSES_JSON = importlib.resources.read_text("pyspark.errors", "error-conditions.json")

Review Comment:
   I agree, it's sensitive. We definitely don't want to precipitate an emergency release.
   
   I am happy to volunteer for pre-release testing whenever the next release candidate for Spark 4.0 comes out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1470588168


##########
python/pyspark/errors/error_classes.py:
##########
@@ -15,1110 +15,14 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-    "message": [
-      "An application name must be set in your configuration."
-    ]
-  },
-  "ARGUMENT_REQUIRED": {
-    "message": [
-      "Argument `<arg_name>` is required when <condition>."
-    ]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-    "message": [
-      "Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT."
-    ]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-    "message": [
-      "Attribute `<attr_name>` in provided object `<obj_name>` is not callable."
-    ]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported."
-    ]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-    "message": [
-      "Length mismatch: Expected axis has <expected_length> element, new values have <actual_length> elements."
-    ]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-    "message": [
-      "Broadcast variable `<variable>` not loaded."
-    ]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-    "message": [
-      "Not supported to call `<func_name>` before initialize <object>."
-    ]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-    "message": [
-      "`<data_type>` can not accept object `<obj_name>` in type `<obj_type>`."
-    ]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-    "message": [
-      "Dunder(double underscore) attribute is for internal use only."
-    ]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-    "message": [
-      "Cannot apply 'in' operator against a column: please use 'contains' in a string column or 'array_contains' function for an array column."
-    ]
-  },
-  "CANNOT_BE_EMPTY": {
-    "message": [
-      "At least one <item> must be specified."
-    ]
-  },
-  "CANNOT_BE_NONE": {
-    "message": [
-      "Argument `<arg_name>` cannot be None."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-    "message": [
-      "Spark Connect server cannot be configured: Existing [<existing_url>], New [<new_url>]."
-    ]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-    "message": [
-      "Spark Connect server and Spark master cannot be configured together: Spark master [<master_url>], Spark Connect [<connect_url>]."
-    ]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-    "message": [
-      "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions."
-    ]
-  },
-  "CANNOT_CONVERT_TYPE": {
-    "message": [
-      "Cannot convert <from_type> into <to_type>."
-    ]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-    "message": [
-      "Some of types cannot be determined after inferring."
-    ]
-  },
-  "CANNOT_GET_BATCH_ID": {
-    "message": [
-      "Could not get batch id from <obj_name>."
-    ]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-    "message": [
-      "Can not infer Array Type from a list with None as the first element."
-    ]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-    "message": [
-      "Can not infer schema from an empty dataset."
-    ]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-    "message": [
-      "Can not infer schema for type: `<data_type>`."
-    ]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-    "message": [
-      "Unable to infer the type of the field `<field_name>`."
-    ]
-  },
-  "CANNOT_MERGE_TYPE": {
-    "message": [
-      "Can not merge type `<data_type1>` and `<data_type2>`."
-    ]
-  },
-  "CANNOT_OPEN_SOCKET": {
-    "message": [
-      "Can not open socket: <errors>."
-    ]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-    "message": [
-      "Unable to parse datatype. <msg>."
-    ]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-    "message": [
-      "Metadata can only be provided for a single column."
-    ]
-  },
-  "CANNOT_SET_TOGETHER": {
-    "message": [
-      "<arg_list> should not be set together."
-    ]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-    "message": [
-      "returnType can not be specified when `<arg_name>` is a user-defined function, but got <return_type>."
-    ]
-  },
-  "CANNOT_WITHOUT": {
-    "message": [
-      "Cannot <condition1> without <condition2>."
-    ]
-  },
-  "COLUMN_IN_LIST": {
-    "message": [
-      "`<func_name>` does not allow a Column in a list."
-    ]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-    "message": [
-      "Only one Spark Connect client URL can be set; however, got a different URL [<new_url>] from the existing [<existing_url>]."
-    ]
-  },
-  "CONNECT_URL_NOT_SET": {
-    "message": [
-      "Cannot create a Spark Connect session because the Spark Connect remote URL has not been set. Please define the remote URL by setting either the 'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-    ]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."
-    ]
-  },
-  "CONTEXT_UNAVAILABLE_FOR_REMOTE_CLIENT": {
-    "message": [
-      "Remote client cannot create a SparkContext. Create SparkSession instead."
-    ]
-  },
-  "DATA_SOURCE_INVALID_RETURN_TYPE": {
-    "message": [
-      "Unsupported return type ('<type>') from Python data source '<name>'. Expected types: <supported_types>."
-    ]
-  },
-  "DATA_SOURCE_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "Return schema mismatch in the result from 'read' method. Expected: <expected> columns, Found: <actual> columns. Make sure the returned values match the required output schema."
-    ]
-  },
-  "DATA_SOURCE_TYPE_MISMATCH": {
-    "message": [
-      "Expected <expected>, but got <actual>."
-    ]
-  },
-  "DIFFERENT_PANDAS_DATAFRAME": {
-    "message": [
-      "DataFrames are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_INDEX": {
-    "message": [
-      "Indices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_MULTIINDEX": {
-    "message": [
-      "MultiIndices are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_PANDAS_SERIES": {
-    "message": [
-      "Series are not almost equal:",
-      "Left:",
-      "<left>",
-      "<left_dtype>",
-      "Right:",
-      "<right>",
-      "<right_dtype>"
-    ]
-  },
-  "DIFFERENT_ROWS": {
-    "message": [
-      "<error_msg>"
-    ]
-  },
-  "DIFFERENT_SCHEMA": {
-    "message": [
-      "Schemas do not match.",
-      "--- actual",
-      "+++ expected",
-      "<error_msg>"
-    ]
-  },
-  "DISALLOWED_TYPE_FOR_CONTAINER": {
-    "message": [
-      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <item_type>"
-    ]
-  },
-  "DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT": {
-    "message": [
-      "Duplicated field names in Arrow Struct are not allowed, got <field_names>"
-    ]
-  },
-  "ERROR_OCCURRED_WHILE_CALLING": {
-    "message": [
-      "An error occurred while calling <func_name>: <error_msg>."
-    ]
-  },
-  "HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN": {
-    "message": [
-      "Function `<func_name>` should return Column, got <return_type>."
-    ]
-  },
-  "INCORRECT_CONF_FOR_PROFILE": {
-    "message": [
-      "`spark.python.profile` or `spark.python.profile.memory` configuration",
-      " must be set to `true` to enable Python profile."
-    ]
-  },
-  "INDEX_NOT_POSITIVE": {
-    "message": [
-      "Index must be positive, got '<index>'."
-    ]
-  },
-  "INDEX_OUT_OF_RANGE": {
-    "message": [
-      "<arg_name> index out of range, got '<index>'."
-    ]
-  },
-  "INVALID_ARROW_UDTF_RETURN_TYPE": {
-    "message": [
-      "The return type of the arrow-optimized Python UDTF should be of type 'pandas.DataFrame', but the '<func>' method returned a value of type <return_type> with value: <value>."
-    ]
-  },
-  "INVALID_BROADCAST_OPERATION": {
-    "message": [
-      "Broadcast can only be <operation> in driver."
-    ]
-  },
-  "INVALID_CALL_ON_UNRESOLVED_OBJECT": {
-    "message": [
-      "Invalid call to `<func_name>` on unresolved object."
-    ]
-  },
-  "INVALID_CONNECT_URL": {
-    "message": [
-      "Invalid URL for Spark Connect: <detail>"
-    ]
-  },
-  "INVALID_INTERVAL_CASTING": {
-    "message": [
-      "Interval <start_field> to <end_field> is invalid."
-    ]
-  },
-  "INVALID_ITEM_FOR_CONTAINER": {
-    "message": [
-      "All items in `<arg_name>` should be in <allowed_types>, got <item_type>."
-    ]
-  },
-  "INVALID_MULTIPLE_ARGUMENT_CONDITIONS": {
-    "message": [
-      "[{arg_names}] cannot be <condition>."
-    ]
-  },
-  "INVALID_NDARRAY_DIMENSION": {
-    "message": [
-      "NumPy array input should be of <dimensions> dimensions."
-    ]
-  },
-  "INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP": {
-    "message": [
-      "Invalid number of dataframes in group <dataframes_in_group>."
-    ]
-  },
-  "INVALID_PANDAS_UDF": {
-    "message": [
-      "Invalid function: <detail>"
-    ]
-  },
-  "INVALID_PANDAS_UDF_TYPE": {
-    "message": [
-      "`<arg_name>` should be one of the values from PandasUDFType, got <arg_type>"
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_ARROW_UDF": {
-    "message": [
-      "Grouped and Cogrouped map Arrow UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_RETURN_TYPE_FOR_PANDAS_UDF": {
-    "message": [
-      "Pandas UDF should return StructType for <eval_type>, got <return_type>."
-    ]
-  },
-  "INVALID_SESSION_UUID_ID": {
-    "message": [
-      "Parameter value <arg_name> must be a valid UUID format: <origin>"
-    ]
-  },
-  "INVALID_TIMEOUT_TIMESTAMP": {
-    "message": [
-      "Timeout timestamp (<timestamp>) cannot be earlier than the current watermark (<watermark>)."
-    ]
-  },
-  "INVALID_TYPE": {
-    "message": [
-      "Argument `<arg_name>` should not be a <arg_type>."
-    ]
-  },
-  "INVALID_TYPENAME_CALL": {
-    "message": [
-      "StructField does not have typeName. Use typeName on its type explicitly instead."
-    ]
-  },
-  "INVALID_TYPE_DF_EQUALITY_ARG": {
-    "message": [
-      "Expected type <expected_type> for `<arg_name>` but got type <actual_type>."
-    ]
-  },
-  "INVALID_UDF_EVAL_TYPE": {
-    "message": [
-      "Eval type for UDF must be <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_BOTH_RETURN_TYPE_AND_ANALYZE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It has both its return type and an 'analyze' attribute. Please make it have one of either the return type or the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_EVAL_TYPE": {
-    "message": [
-      "The eval type for the UDTF '<name>' is invalid. It must be one of <eval_type>."
-    ]
-  },
-  "INVALID_UDTF_HANDLER_TYPE": {
-    "message": [
-      "The UDTF is invalid. The function handler must be a class, but got '<type>'. Please provide a class as the function handler."
-    ]
-  },
-  "INVALID_UDTF_NO_EVAL": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not implement the required 'eval' method. Please implement the 'eval' method in '<name>' and try again."
-    ]
-  },
-  "INVALID_UDTF_RETURN_TYPE": {
-    "message": [
-      "The UDTF '<name>' is invalid. It does not specify its return type or implement the required 'analyze' static method. Please specify the return type or implement the 'analyze' static method in '<name>' and try again."
-    ]
-  },
-  "INVALID_WHEN_USAGE": {
-    "message": [
-      "when() can only be applied on a Column previously generated by when() function, and cannot be applied once otherwise() is applied."
-    ]
-  },
-  "INVALID_WINDOW_BOUND_TYPE": {
-    "message": [
-      "Invalid window bound type: <window_bound_type>."
-    ]
-  },
-  "JAVA_GATEWAY_EXITED": {
-    "message": [
-      "Java gateway process exited before sending its port number."
-    ]
-  },
-  "JVM_ATTRIBUTE_NOT_SUPPORTED": {
-    "message": [
-      "Attribute `<attr_name>` is not supported in Spark Connect as it depends on the JVM. If you need to use this attribute, do not use Spark Connect when creating your session. Visit https://spark.apache.org/docs/latest/sql-getting-started.html#starting-point-sparksession for creating regular Spark Session in detail."
-    ]
-  },
-  "KEY_NOT_EXISTS": {
-    "message": [
-      "Key `<key>` is not exists."
-    ]
-  },
-  "KEY_VALUE_PAIR_REQUIRED": {
-    "message": [
-      "Key-value pair or a list of pairs is required."
-    ]
-  },
-  "LENGTH_SHOULD_BE_THE_SAME": {
-    "message": [
-      "<arg1> and <arg2> should be of the same length, got <arg1_length> and <arg2_length>."
-    ]
-  },
-  "MASTER_URL_NOT_SET": {
-    "message": [
-      "A master URL must be set in your configuration."
-    ]
-  },
-  "MISSING_LIBRARY_FOR_PROFILER": {
-    "message": [
-      "Install the 'memory_profiler' library in the cluster to enable memory profiling."
-    ]
-  },
-  "MISSING_VALID_PLAN": {
-    "message": [
-      "Argument to <operator> does not contain a valid plan."
-    ]
-  },
-  "MIXED_TYPE_REPLACEMENT": {
-    "message": [
-      "Mixed type replacements are not supported."
-    ]
-  },
-  "NEGATIVE_VALUE": {
-    "message": [
-      "Value for `<arg_name>` must be greater than or equal to 0, got '<arg_value>'."
-    ]
-  },
-  "NOT_BOOL": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, dict, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float or int, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_LIST_OR_NONE_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int, list, None, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_FLOAT_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool, float, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or list, got <arg_type>."
-    ]
-  },
-  "NOT_BOOL_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a bool or str, got <arg_type>."
-    ]
-  },
-  "NOT_CALLABLE": {
-    "message": [
-      "Argument `<arg_name>` should be a callable, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, str or DataType, but got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, float, integer, list or string, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or int, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int, list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_INT_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, int or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_COLUMN_OR_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a StructType, Column or str, got <arg_type>."
-    ]
-  },
-  "NOT_DATAFRAME": {
-    "message": [
-      "Argument `<arg_name>` should be a DataFrame, got <arg_type>."
-    ]
-  },
-  "NOT_DATATYPE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a DataType or str, got <arg_type>."
-    ]
-  },
-  "NOT_DICT": {
-    "message": [
-      "Argument `<arg_name>` should be a dict, got <arg_type>."
-    ]
-  },
-  "NOT_EXPRESSION": {
-    "message": [
-      "Argument `<arg_name>` should be an Expression, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a float or int, got <arg_type>."
-    ]
-  },
-  "NOT_FLOAT_OR_INT_OR_LIST_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a float, int, list or str, got <arg_type>."
-    ]
-  },
-  "NOT_IMPLEMENTED": {
-    "message": [
-      "<feature> is not implemented."
-    ]
-  },
-  "NOT_INSTANCE_OF": {
-    "message": [
-      "<value> is not an instance of type <type>."
-    ]
-  },
-  "NOT_INT": {
-    "message": [
-      "Argument `<arg_name>` should be an int, got <arg_type>."
-    ]
-  },
-  "NOT_INT_OR_SLICE_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an int, slice or str, got <arg_type>."
-    ]
-  },
-  "NOT_IN_BARRIER_STAGE": {
-    "message": [
-      "It is not in a barrier stage."
-    ]
-  },
-  "NOT_ITERABLE": {
-    "message": [
-      "<objectName> is not iterable."
-    ]
-  },
-  "NOT_LIST": {
-    "message": [
-      "Argument `<arg_name>` should be a list, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_COLUMN_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[Column]."
-    ]
-  },
-  "NOT_LIST_OF_FLOAT_OR_INT": {
-    "message": [
-      "Argument `<arg_name>` should be a list[float, int], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OF_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a list[str], got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_NONE_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a list, None or StructType, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_STR_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list, str or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_LIST_OR_TUPLE": {
-    "message": [
-      "Argument `<arg_name>` should be a list or tuple, got <arg_type>."
-    ]
-  },
-  "NOT_NUMERIC_COLUMNS": {
-    "message": [
-      "Numeric aggregation function can only be applied on numeric columns, got <invalid_columns>."
-    ]
-  },
-  "NOT_OBSERVATION_OR_STR": {
-    "message": [
-      "Argument `<arg_name>` should be an Observation or str, got <arg_type>."
-    ]
-  },
-  "NOT_SAME_TYPE": {
-    "message": [
-      "Argument `<arg_name1>` and `<arg_name2>` should be the same type, got <arg_type1> and <arg_type2>."
-    ]
-  },
-  "NOT_STR": {
-    "message": [
-      "Argument `<arg_name>` should be a str, got <arg_type>."
-    ]
-  },
-  "NOT_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a struct type, got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_LIST_OF_RDD": {
-    "message": [
-      "Argument `<arg_name>` should be a str or list[RDD], got <arg_type>."
-    ]
-  },
-  "NOT_STR_OR_STRUCT": {
-    "message": [
-      "Argument `<arg_name>` should be a str or struct type, got <arg_type>."
-    ]
-  },
-  "NOT_WINDOWSPEC": {
-    "message": [
-      "Argument `<arg_name>` should be a WindowSpec, got <arg_type>."
-    ]
-  },
-  "NO_ACTIVE_EXCEPTION": {
-    "message": [
-      "No active exception."
-    ]
-  },
-  "NO_ACTIVE_OR_DEFAULT_SESSION": {
-    "message": [
-      "No active or default Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_ACTIVE_SESSION": {
-    "message": [
-      "No active Spark session found. Please create a new Spark session before running the code."
-    ]
-  },
-  "NO_OBSERVE_BEFORE_GET": {
-    "message": [
-      "Should observe by calling `DataFrame.observe` before `get`."
-    ]
-  },
-  "NO_SCHEMA_AND_DRIVER_DEFAULT_SCHEME": {
-    "message": [
-      "Only allows <arg_name> to be a path without scheme, and Spark Driver should use the default scheme to determine the destination file system."
-    ]
-  },
-  "ONLY_ALLOWED_FOR_SINGLE_COLUMN": {
-    "message": [
-      "Argument `<arg_name>` can only be provided for a single column."
-    ]
-  },
-  "ONLY_ALLOW_SINGLE_TRIGGER": {
-    "message": [
-      "Only a single trigger is allowed."
-    ]
-  },
-  "ONLY_SUPPORTED_WITH_SPARK_CONNECT": {
-    "message": [
-      "<feature> is only supported with Spark Connect; however, the current Spark session does not use Spark Connect."
-    ]
-  },
-  "PACKAGE_NOT_INSTALLED": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, it was not found."
-    ]
-  },
-  "PIPE_FUNCTION_EXITED": {
-    "message": [
-      "Pipe function `<func_name>` exited with error code <error_code>."
-    ]
-  },
-  "PYTHON_HASH_SEED_NOT_SET": {
-    "message": [
-      "Randomness of hash of string should be disabled via PYTHONHASHSEED."
-    ]
-  },
-  "PYTHON_VERSION_MISMATCH": {
-    "message": [
-      "Python in worker has different version: <worker_version> than that in driver: <driver_version>, PySpark cannot run with different minor versions.",
-      "Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set."
-    ]
-  },
-  "RDD_TRANSFORM_ONLY_VALID_ON_DRIVER": {
-    "message": [
-      "It appears that you are attempting to broadcast an RDD or reference an RDD from an ",
-      "action or transformation. RDD transformations and actions can only be invoked by the ",
-      "driver, not inside of other transformations; for example, ",
-      "rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values ",
-      "transformation and count action cannot be performed inside of the rdd1.map ",
-      "transformation. For more information, see SPARK-5063."
-    ]
-  },
-  "READ_ONLY": {
-    "message": [
-      "<object> is read-only."
-    ]
-  },
-  "RESPONSE_ALREADY_RECEIVED": {
-    "message": [
-      "OPERATION_NOT_FOUND on the server but responses were already received from it."
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Column names of the returned pyarrow.Table do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Column names of the returned pandas.DataFrame do not match specified schema.<missing><extra>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Number of columns of the returned pandas.DataFrame doesn't match specified schema. Expected: <expected> Actual: <actual>"
-    ]
-  },
-  "RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "The length of output in Scalar iterator pandas UDF should be the same with the input's; however, the length of output was <output_length> and the length of input was <input_length>."
-    ]
-  },
-  "RESULT_TYPE_MISMATCH_FOR_ARROW_UDF": {
-    "message": [
-      "Columns do not match in their data type: <mismatch>."
-    ]
-  },
-  "RETRIES_EXCEEDED": {
-    "message": [
-      "The maximum number of retries has been exceeded."
-    ]
-  },
-  "REUSE_OBSERVATION": {
-    "message": [
-      "An Observation can be used with a DataFrame only once."
-    ]
-  },
-  "SCHEMA_MISMATCH_FOR_PANDAS_UDF": {
-    "message": [
-      "Result vector from pandas_udf was not the required length: expected <expected>, got <actual>."
-    ]
-  },
-  "SESSION_ALREADY_EXIST": {
-    "message": [
-      "Cannot start a remote Spark session because there is a regular Spark session already running."
-    ]
-  },
-  "SESSION_NEED_CONN_STR_OR_BUILDER": {
-    "message": [
-      "Needs either connection string or channelBuilder (mutually exclusive) to create a new SparkSession."
-    ]
-  },
-  "SESSION_NOT_SAME": {
-    "message": [
-      "Both Datasets must belong to the same SparkSession."
-    ]
-  },
-  "SESSION_OR_CONTEXT_EXISTS": {
-    "message": [
-      "There should not be an existing Spark Session or Spark Context."
-    ]
-  },
-  "SESSION_OR_CONTEXT_NOT_EXISTS": {
-    "message": [
-      "SparkContext or SparkSession should be created first."
-    ]
-  },
-  "SLICE_WITH_STEP": {
-    "message": [
-      "Slice with step is not supported."
-    ]
-  },
-  "STATE_NOT_EXISTS": {
-    "message": [
-      "State is either not defined or has already been removed."
-    ]
-  },
-  "STOP_ITERATION_OCCURRED": {
-    "message": [
-      "Caught StopIteration thrown from user's code; failing the task: <exc>"
-    ]
-  },
-  "STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF": {
-    "message": [
-      "pandas iterator UDF should exhaust the input iterator."
-    ]
-  },
-  "STREAMING_CONNECT_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the function `<name>`. If you accessed the Spark session, or a DataFrame defined outside of the function, or any object that contains a Spark session, please be aware that they are not allowed in Spark Connect. For `foreachBatch`, please access the Spark session using `df.sparkSession`, where `df` is the first parameter in your `foreachBatch` function. For `StreamingQueryListener`, please access the Spark session using `self.spark`. For details please check out the PySpark doc for `foreachBatch` and `StreamingQueryListener`."
-    ]
-  },
-  "TEST_CLASS_NOT_COMPILED": {
-    "message": [
-      "<test_class_path> doesn't exist. Spark sql test classes are not compiled."
-    ]
-  },
-  "TOO_MANY_VALUES": {
-    "message": [
-      "Expected <expected> values for `<item>`, got <actual>."
-    ]
-  },
-  "TYPE_HINT_SHOULD_BE_SPECIFIED": {
-    "message": [
-      "Type hints for <target> should be specified; however, got <sig>."
-    ]
-  },
-  "UDF_RETURN_TYPE": {
-    "message": [
-      "Return type of the user-defined function should be <expected>, but is <actual>."
-    ]
-  },
-  "UDTF_ARROW_TYPE_CAST_ERROR": {
-    "message": [
-      "Cannot convert the output value of the column '<col_name>' with type '<col_type>' to the specified return type of the column: '<arrow_type>'. Please check if the data types match and try again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_IMPLEMENTS_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function implements the 'analyze' method, but its constructor has more than two arguments (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, or one 'self' argument plus another argument for the result of the 'analyze' method, and try the query again."
-    ]
-  },
-  "UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because its constructor is invalid: the function does not implement the 'analyze' method, and its constructor has more than one argument (including the 'self' reference). Please update the table function so that its constructor accepts exactly one 'self' argument, and try the query again."
-    ]
-  },
-  "UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE": {
-    "message": [
-      "Failed to evaluate the user-defined table function '<name>' because the function arguments did not match the expected signature of the 'eval' method (<reason>). Please update the query so that this table function call provides arguments matching the expected signature, or else update the table function so that its 'eval' method accepts the provided arguments, and then try the query again."
-    ]
-  },
-  "UDTF_EXEC_ERROR": {
-    "message": [
-      "User defined table function encountered an error in the '<method_name>' method: <error>"
-    ]
-  },
-  "UDTF_INVALID_OUTPUT_ROW_TYPE": {
-    "message": [
-      "The type of an individual output row in the '<func>' method of the UDTF is invalid. Each row should be a tuple, list, or dict, but got '<type>'. Please make sure that the output rows are of the correct type."
-    ]
-  },
-  "UDTF_RETURN_NOT_ITERABLE": {
-    "message": [
-      "The return value of the '<func>' method of the UDTF is invalid. It should be an iterable (e.g., generator or list), but got '<type>'. Please make sure that the UDTF returns one of these types."
-    ]
-  },
-  "UDTF_RETURN_SCHEMA_MISMATCH": {
-    "message": [
-      "The number of columns in the result does not match the specified schema. Expected column count: <expected>, Actual column count: <actual>. Please make sure the values returned by the '<func>' method have the same number of columns as specified in the output schema."
-    ]
-  },
-  "UDTF_RETURN_TYPE_MISMATCH": {
-    "message": [
-      "Mismatch in return type for the UDTF '<name>'. Expected a 'StructType', but got '<return_type>'. Please ensure the return type is a correctly formatted StructType."
-    ]
-  },
-  "UDTF_SERIALIZATION_ERROR": {
-    "message": [
-      "Cannot serialize the UDTF '<name>': <message>"
-    ]
-  },
-  "UNEXPECTED_RESPONSE_FROM_SERVER": {
-    "message": [
-      "Unexpected response from iterator server."
-    ]
-  },
-  "UNEXPECTED_TUPLE_WITH_STRUCT": {
-    "message": [
-      "Unexpected tuple <tuple> with StructType."
-    ]
-  },
-  "UNKNOWN_EXPLAIN_MODE": {
-    "message": [
-      "Unknown explain mode: '<explain_mode>'. Accepted explain modes are 'simple', 'extended', 'codegen', 'cost', 'formatted'."
-    ]
-  },
-  "UNKNOWN_INTERRUPT_TYPE": {
-    "message": [
-      "Unknown interrupt type: '<interrupt_type>'. Accepted interrupt types are 'all'."
-    ]
-  },
-  "UNKNOWN_RESPONSE": {
-    "message": [
-      "Unknown response: <response>."
-    ]
-  },
-  "UNKNOWN_VALUE_FOR": {
-    "message": [
-      "Unknown value for `<var>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE": {
-    "message": [
-      "Unsupported DataType `<data_type>`."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW": {
-    "message": [
-      "Single data type <data_type> is not supported with Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION": {
-    "message": [
-      "<data_type> is not supported in conversion to Arrow."
-    ]
-  },
-  "UNSUPPORTED_DATA_TYPE_FOR_ARROW_VERSION": {
-    "message": [
-      "<data_type> is only supported with pyarrow 2.0.0 and above."
-    ]
-  },
-  "UNSUPPORTED_JOIN_TYPE": {
-    "message": [
-      "Unsupported join type: <join_type>. Supported join types include: 'inner', 'outer', 'full', 'fullouter', 'full_outer', 'leftouter', 'left', 'left_outer', 'rightouter', 'right', 'right_outer', 'leftsemi', 'left_semi', 'semi', 'leftanti', 'left_anti', 'anti', 'cross'."
-    ]
-  },
-  "UNSUPPORTED_LITERAL": {
-    "message": [
-      "Unsupported Literal '<literal>'."
-    ]
-  },
-  "UNSUPPORTED_LOCAL_CONNECTION_STRING": {
-    "message": [
-      "Creating new SparkSessions with `local` connection string is not supported."
-    ]
-  },
-  "UNSUPPORTED_NUMPY_ARRAY_SCALAR": {
-    "message": [
-      "The type of array scalar '<dtype>' is not supported."
-    ]
-  },
-  "UNSUPPORTED_OPERATION": {
-    "message": [
-      "<operation> is not supported."
-    ]
-  },
-  "UNSUPPORTED_PACKAGE_VERSION": {
-    "message": [
-      "<package_name> >= <minimum_version> must be installed; however, your version is <current_version>."
-    ]
-  },
-  "UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should use only POSITIONAL or POSITIONAL OR KEYWORD arguments."
-    ]
-  },
-  "UNSUPPORTED_SIGNATURE": {
-    "message": [
-      "Unsupported signature: <signature>."
-    ]
-  },
-  "UNSUPPORTED_WITH_ARROW_OPTIMIZATION": {
-    "message": [
-      "<feature> is not supported with Arrow optimization enabled in Python UDFs. Disable 'spark.sql.execution.pythonUDF.arrow.enabled' to workaround."
-    ]
-  },
-  "VALUE_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` does not allow <disallowed_value>."
-    ]
-  },
-  "VALUE_NOT_ACCESSIBLE": {
-    "message": [
-      "Value `<value>` cannot be accessed inside tasks."
-    ]
-  },
-  "VALUE_NOT_ALLOWED": {
-    "message": [
-      "Value for `<arg_name>` has to be amongst the following values: <allowed_values>."
-    ]
-  },
-  "VALUE_NOT_ANY_OR_ALL": {
-    "message": [
-      "Value for `<arg_name>` must be 'any' or 'all', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_BETWEEN": {
-    "message": [
-      "Value for `<arg_name>` must be between <min> and <max>."
-    ]
-  },
-  "VALUE_NOT_NON_EMPTY_STR": {
-    "message": [
-      "Value for `<arg_name>` must be a non-empty string, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PEARSON": {
-    "message": [
-      "Value for `<arg_name>` only supports the 'pearson', got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_PLAIN_COLUMN_REFERENCE": {
-    "message": [
-      "Value `<val>` in `<field_name>` should be a plain column reference such as `df.col` or `col('column')`."
-    ]
-  },
-  "VALUE_NOT_POSITIVE": {
-    "message": [
-      "Value for `<arg_name>` must be positive, got '<arg_value>'."
-    ]
-  },
-  "VALUE_NOT_TRUE": {
-    "message": [
-      "Value for `<arg_name>` must be True, got '<arg_value>'."
-    ]
-  },
-  "VALUE_OUT_OF_BOUND": {
-    "message": [
-      "Value for `<arg_name>` must be greater than <lower_bound> or less than <upper_bound>, got <actual>"
-    ]
-  },
-  "WRONG_NUM_ARGS_FOR_HIGHER_ORDER_FUNCTION": {
-    "message": [
-      "Function `<func_name>` should take between 1 and 3 arguments, but the provided function takes <num_args>."
-    ]
-  },
-  "WRONG_NUM_COLUMNS": {
-    "message": [
-      "Function `<func_name>` should take at least <num_cols> columns."
-    ]
-  }
-}
-'''
-
+# Note: Though we call them "error classes" here, the proper name is "error conditions",
+#   hence why the name of the JSON file different.
+#   For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+# Note: When we drop support for Python 3.8, we should migrate from importlib.resources.read_text()
+#   to importlib.resources.files().joinpath().read_text().
+#   See: https://docs.python.org/3/library/importlib.resources.html#importlib.resources.open_text
+ERROR_CLASSES_JSON = importlib.resources.read_text("pyspark.errors", "error-conditions.json")

Review Comment:
   Manual tests should be fine. I just want to be cautious here .. because it's the very entry point of pyspark .... if we go wrong here, we would need another release :-).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.

nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2043333232

   @HyukjinKwon - Anything else you'd like to see done here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.

HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2087787520

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org