You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/08/24 13:20:52 UTC

[GitHub] [spark] itholic opened a new pull request, #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

itholic opened a new pull request, #42658:
URL: https://github.com/apache/spark/pull/42658

   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   This PR introduces a new script named `errors_doc_gen.py` that automatically generates an RST documentation file containing the error classes defined in `pyspark.errors.error_classes`.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   With the increasing number of error classes in PySpark, it becomes imperative to have a centralized documentation that provides an overview of these errors for both developers and users. This automated script ensures that the documentation is always up-to-date with the error classes defined in the source code, reducing the manual effort required to update the documentation each time a new error class is introduced or an existing one is modified.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No, this script is primarily for internal use by developers to generate documentation. End-users will benefit from the updated documentation that this script produces, but they will not directly interact with the script itself.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   The script was executed locally, and the generated RST file was verified for accuracy and completeness. Furthermore, the generated RST was rendered using Sphinx to ensure it produces the expected visual output.
   
   <img width="1395" alt="Screenshot 2023-08-24 at 10 19 52 PM" src="https://github.com/apache/spark/assets/44108233/00142374-2fba-43c1-9034-c5db6989cffa">
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42658:
URL: https://github.com/apache/spark/pull/42658#discussion_r1304327400


##########
python/pyspark/errors_doc_gen.py:
##########
@@ -0,0 +1,57 @@
+import re
+from pyspark.errors.error_classes import ERROR_CLASSES_MAP
+
+
+def generate_errors_doc(output_rst_file_path: str) -> None:
+    """
+    Generates a reStructuredText (RST) documentation file for PySpark error classes.
+
+    This function fetches error classes defined in `pyspark.errors.error_classes`
+    and writes them into an RST file. The generated RST file provides an overview
+    of common, named error classes returned by PySpark.
+
+    Parameters
+    ----------
+    output_rst_file_path : str
+        The file path where the RST documentation will be written.
+
+    Notes
+    -----
+    The generated RST file can be rendered using Sphinx to visualize the documentation.
+    """
+    header = """..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+========================
+Error classes in PySpark
+========================
+
+This is a list of common, named error classes returned by PySpark which are defined at `error_classes.py <https://github.com/apache/spark/blob/master/python/pyspark/errors/error_classes.py>`_.
+
+When writing PySpark errors, developers must use an error class from the list. If an appropriate error class is not available, add a new one into the list. For more information, please refer to `Contributing Error and Exception <https://spark.apache.org/docs/latest/api/python/development/contributing.html#contributing-error-and-exception>`_.
+"""  # noqa

Review Comment:
   Using `# noqa` here to bypass line length limit in flake8. This is necessary to maintain the format and readability of the RST documentation string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #42658:
URL: https://github.com/apache/spark/pull/42658#issuecomment-1692611581

   I see the documentation build is succeed on CI:
   ```
   copying static files... ... done
   copying extra files... done
   dumping search index in English (code: en)... done
   dumping object inventory... done
   build succeeded.
   
   The HTML pages are in build/html.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bjornjorgensen commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "bjornjorgensen (via GitHub)" <gi...@apache.org>.
bjornjorgensen commented on PR #42658:
URL: https://github.com/apache/spark/pull/42658#issuecomment-1692081328

   HINT 
   it is quite easy to see that you are using AI tools on this.
   ```
               f.write("-" * len(error_key) + "\n\n")  # Underline for the error key
               messages = error_details["message"]
               for message in messages:
                   # Escape parentheses with a backslash when they follow a backtick
                   message = re.sub(r"`(\()", r"`\\\1", message)
                   f.write(message + "\n")
               f.write("\n")  # Add an empty line for better readability
   ```
   
   I think you have to answer Yes to this one
   Was this patch authored or co-authored using generative AI tooling?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42658:
URL: https://github.com/apache/spark/pull/42658#issuecomment-1692614236

   I think it's fine. In fact it's not the part of the actual code (and not even part of the release)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on a diff in pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #42658:
URL: https://github.com/apache/spark/pull/42658#discussion_r1304333045


##########
python/docs/source/development/errors.rst:
##########
@@ -1,92 +0,0 @@
-..  Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   The `python/docs/source/development/errors.rst` file was removed as it's now dynamically generated during documentation build. This ensures that the error documentation is always up-to-date with the source code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation
URL: https://github.com/apache/spark/pull/42658


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42658:
URL: https://github.com/apache/spark/pull/42658#issuecomment-1692614581

   You can even just rewrite the comment on your own @itholic 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #42658:
URL: https://github.com/apache/spark/pull/42658#issuecomment-1692602464

   > I think you have to answer Yes to this one
   Was this patch authored or co-authored using generative AI tooling?
   
   Actually I asked the generative AI to suggest a comment, but this script is just a slight modification of the code I wrote a long while ago :-)
   
   That is, excluding the two comments, `# Underline for the error key` and `# Add an empty line for better readability`, the rest of the entire code was not written with the assistance of AI tooling at all.
   
   I wonder if we can still consider AI as a co-author in such cases? If so, I would simply remove the two comments added by the AI instead of answer "Yes" since it seems not make enough sense to me.
   
   cc @zero323 @HyukjinKwon Can I happen to ask your thought on this? This seems like a bit of a subtle issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #42658:
URL: https://github.com/apache/spark/pull/42658#issuecomment-1694976628

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #42658:
URL: https://github.com/apache/spark/pull/42658#issuecomment-1692603807

   > I think you have to answer Yes to this one
   Was this patch authored or co-authored using generative AI tooling?
   
   Actually I asked the generative AI to suggest a comment, but basically this script is just a slight modification of the code I wrote a long while ago :-)
   
   That is, excluding the two comments, `# Underline for the error key` and `# Add an empty line for better readability`, the rest of the entire code was not written with the assistance of AI tooling at all.
   
   I wonder if we can still consider AI as a co-author in such cases? If so, I would simply remove the two comments suggested by the AI instead of answer "Yes" since it seems not make enough sense to me.
   
   cc @zero323 @HyukjinKwon Can I happen to ask your thought on this? This seems like a bit of a subtle issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] itholic commented on pull request #42658: [SPARK-44945][DOCS][PYTHON] Automate PySpark error class documentation

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #42658:
URL: https://github.com/apache/spark/pull/42658#issuecomment-1692630251

   Updated 2 comments with self-written comments:
   ```diff
   - # Underline for the error key
   + # The length of the error class name and underline must be the same
   + # to satisfy the RST format.
   ```
   
   ```diff
   - # Add an empty line for better readability
   + # Add 2 new lines between the descriptions of each error class
   + # to improve the readability of the generated RST file.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org