You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HyukjinKwon (via GitHub)" <gi...@apache.org> on 2024/01/23 03:33:35 UTC

[PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

HyukjinKwon opened a new pull request, #44848:
URL: https://github.com/apache/spark/pull/44848

   ### What changes were proposed in this pull request?
   
   This PR proposes:
   - Add the automated way of writing `error_classes.py` file, `from pyspark.errors.exceptions import _write_self; _write_self()`
   - Fix the formatting of the JSON file to be consistent
   - Fix typos within the error messages
   - Fix parameter names to be consistent (it fixes some, not all)
   
   ### Why are the changes needed?
   
   - Now, it is difficult to add a new error class because it enforces alphabetical order or error classes, etc. When you add multiple error classes, you should manually fix and move them around which is troublesome.
   - In addition, the current JSON format isn't very consistent.
   - For consistency. This PR includes the changes of *some* of parameter naming.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it fixes a couple of typos.
   
   ### How was this patch tested?
   
   Unittests were fixed together.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44848:
URL: https://github.com/apache/spark/pull/44848#discussion_r1462697579


##########
python/pyspark/sql/dataframe.py:
##########
@@ -1319,6 +1319,7 @@ def hint(
                     error_class="DISALLOWED_TYPE_FOR_CONTAINER",
                     message_parameters={
                         "arg_name": "parameters",
+                        "arg_type": type(parameters).__name__,

Review Comment:
   Hmm... this also seems to be problematic. It seems that an error should have occurred if a parameter defined in the template was actually missing. Let me investigate this too.



##########
python/pyspark/errors/error_classes.py:
##########
@@ -253,58 +258,58 @@
       "<right_dtype>"
     ]
   },
-  "DIFFERENT_ROWS" : {
-    "message" : [
+  "DIFFERENT_ROWS": {
+    "message": [
       "<error_msg>"
     ]
   },
-  "DIFFERENT_SCHEMA" : {
-    "message" : [
+  "DIFFERENT_SCHEMA": {
+    "message": [
       "Schemas do not match.",
       "--- actual",
       "+++ expected",
       "<error_msg>"
     ]
   },
-  "DISALLOWED_TYPE_FOR_CONTAINER" : {
-    "message" : [
-      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <return_type>"
+  "DISALLOWED_TYPE_FOR_CONTAINER": {
+    "message": [
+      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <item_type>"

Review Comment:
   Oh... does it mean that the parameter name was not matched between template and actual usage but didn't raise any error so far? Let me investigate and create a ticket for it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44848:
URL: https://github.com/apache/spark/pull/44848#issuecomment-1905234306

   cc @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44848:
URL: https://github.com/apache/spark/pull/44848#issuecomment-1905266849

   Sure. let me put this ticket under that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44848:
URL: https://github.com/apache/spark/pull/44848#discussion_r1462708135


##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -14,3 +14,43 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+
+
+def _write_self() -> None:
+    import json
+    from pyspark.errors import error_classes
+
+    with open("python/pyspark/errors/error_classes.py", "w") as f:
+        error_class_py_file = """#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# NOTE: Automatically sort this file via

Review Comment:
   SGTM



##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -14,3 +14,43 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+
+
+def _write_self() -> None:
+    import json
+    from pyspark.errors import error_classes
+
+    with open("python/pyspark/errors/error_classes.py", "w") as f:
+        error_class_py_file = """#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# NOTE: Automatically sort this file via

Review Comment:
   SGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #44848: [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function
URL: https://github.com/apache/spark/pull/44848


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44848:
URL: https://github.com/apache/spark/pull/44848#issuecomment-1905448983

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "nchammas (via GitHub)" <gi...@apache.org>.
nchammas commented on code in PR #44848:
URL: https://github.com/apache/spark/pull/44848#discussion_r1462706729


##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -14,3 +14,43 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+
+
+def _write_self() -> None:
+    import json
+    from pyspark.errors import error_classes
+
+    with open("python/pyspark/errors/error_classes.py", "w") as f:
+        error_class_py_file = """#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# NOTE: Automatically sort this file via

Review Comment:
   If `error_classes.py` is not meant to be edited manually, I would add a clear warning here so people don't mistakenly edit the file, similar to #44847.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44848:
URL: https://github.com/apache/spark/pull/44848#discussion_r1462708431


##########
python/pyspark/errors/exceptions/__init__.py:
##########
@@ -14,3 +14,43 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 #
+
+
+def _write_self() -> None:
+    import json
+    from pyspark.errors import error_classes
+
+    with open("python/pyspark/errors/error_classes.py", "w") as f:
+        error_class_py_file = """#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# NOTE: Automatically sort this file via

Review Comment:
   Oh actually this case is slightly different. Has to be manually edited first, and then reformatted via that code :-).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44848:
URL: https://github.com/apache/spark/pull/44848#discussion_r1462705835


##########
python/pyspark/errors/error_classes.py:
##########
@@ -253,58 +258,58 @@
       "<right_dtype>"
     ]
   },
-  "DIFFERENT_ROWS" : {
-    "message" : [
+  "DIFFERENT_ROWS": {
+    "message": [
       "<error_msg>"
     ]
   },
-  "DIFFERENT_SCHEMA" : {
-    "message" : [
+  "DIFFERENT_SCHEMA": {
+    "message": [
       "Schemas do not match.",
       "--- actual",
       "+++ expected",
       "<error_msg>"
     ]
   },
-  "DISALLOWED_TYPE_FOR_CONTAINER" : {
-    "message" : [
-      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <return_type>"
+  "DISALLOWED_TYPE_FOR_CONTAINER": {
+    "message": [
+      "Argument `<arg_name>`(type: <arg_type>) should only contain a type in [<allowed_types>], got <item_type>"

Review Comment:
   Yeah. I think it wasn't being tested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46808][PYTHON] Refine error classes in Python with automatic sorting function [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44848:
URL: https://github.com/apache/spark/pull/44848#issuecomment-1905258034

   > Fix parameter names to be consistent (it fixes some, not all)
   
   We might need further effort to refine the error class (duplicated name, consistency of messages, etc.). Will add some more items into SPARK-45673. (and maybe also include this ticket also into SPARK-45673 ?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org