You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "itholic (via GitHub)" <gi...@apache.org> on 2023/11/25 20:53:41 UTC

[PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

itholic opened a new pull request, #44012:
URL: https://github.com/apache/spark/pull/44012

   ### What changes were proposed in this pull request?
   
   This PR proposes to enhance the PySpark documentation by leveraging modern Sphinx features and functionalities. The primary objective is to improve the overall user experience and readability of the documentation. To achieve this, the PR includes an upgrade of `Sphinx` and `Jinja2` to their latest versions, enabling us to use the latest `pydata_sphinx_theme` features such as light/dark mode toggling.
   
   ### Why are the changes needed?
   
   Currently, the PySpark documentation is unable to utilize many of the advanced features available in recent `Sphinx` versions due to older package versions. This limitation hinders the documentation's visual appeal and usability, particularly when compared to other projects like Pandas which have already adopted these enhancements. For example:
   
   ## Pandas API reference (better layout / switching light & dark mode available)
   <img width="1409" alt="Screenshot 2023-11-26 at 5 43 29 AM" src="https://github.com/apache/spark/assets/44108233/0f97ce4a-c1ec-47fb-9295-445c2d557393">
   
   <img width="1403" alt="Screenshot 2023-11-26 at 5 45 01 AM" src="https://github.com/apache/spark/assets/44108233/715f74a8-9e49-4c05-80ef-5531d2e68220">
   
   
   ## Pandas-on-Spark API reference (less readable compare to pandas / only light mode)
   <img width="1312" alt="Screenshot 2023-11-26 at 5 43 48 AM" src="https://github.com/apache/spark/assets/44108233/722d2b61-e231-4387-a5ab-dcd447045d94">
   
   
   
   By updating the `Sphinx` and `Jinja2` versions, we can significantly improve the documentation's layout, design, and interactive features, thereby enhancing the end-user experience.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No API changes, but users will notice a more modern and user-friendly interface in the PySpark documentation. New features like light/dark mode and improved page layouts will be available as below:
   
   ## Dark mode
   <img width="1396" alt="Screenshot 2023-11-26 at 5 50 46 AM" src="https://github.com/apache/spark/assets/44108233/83e082c5-f7eb-4463-a6b8-197a60633494">
   
   
   ## Light mode
   <img width="1394" alt="Screenshot 2023-11-26 at 5 51 01 AM" src="https://github.com/apache/spark/assets/44108233/e50acf33-8b4d-44e1-9f56-2ab2b0054ff4">
   
   
   ### How was this patch tested?
   
   Manually built docs from local environment, and also tested combinations between various `Jinja2`, `Sphinx` and `pydata_sphinx_theme` versions for best document rendering.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405238199


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -319,8 +320,8 @@ specific plotting methods of the form ``DataFrame.plot.<kind>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst
 
-   DataFrame.plot

Review Comment:
   In newer versions of Sphinx, the build will fail because `DataFrame.plot` and `Series.plot` are determined to be duplicates of the list of functions described below.
   
   In fact, this behavior seems reasonable since `.plot` is simply an accessor keyword and not a function, so I believe we can just simply leave it out of the document.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405236561


##########
python/docs/source/_templates/autosummary/class_with_docs.rst:
##########
@@ -47,7 +47,9 @@
 
     .. autosummary::
     {% for item in attributes %}
-       ~{{ name }}.{{ item }}
+        {% if not (item == 'uid') %}

Review Comment:
   We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405239729


##########
dev/requirements.txt:
##########
@@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54
 mkdocs
 
 # Documentation (Python)
-pydata_sphinx_theme
+pydata_sphinx_theme==0.13
 ipython
 nbsphinx
 numpydoc
-jinja2<3.0.0
-sphinx<3.1.0
+jinja2
+sphinx==4.2.0

Review Comment:
   I see that other Sphinx versions do not generate documentation properly for some reason. I have tested as many combinations as possible with Jinja2 and pydata_sphinx_theme, but I have confirmed that Sphinx version 4.2.0 currently renders documents in the most optimal form. Will investigate further in the future to support the latest Sphinx if necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405234673


##########
python/docs/source/conf.py:
##########
@@ -194,7 +194,11 @@
 # further.  For a list of options available for each theme, see the
 # documentation.
 html_theme_options = {
-    "navbar_end": ["version-switcher"]
+    "navbar_end": ["version-switcher", "theme-switcher"],
+    "logo": {
+        "image_light": "_static/spark-logo-light.png",
+        "image_dark": "_static/spark-logo-dark.png",

Review Comment:
   FYI: The default mode for light/dark is `auto`, which will choose a theme based on the system settings from user, but we can choose one of `dark` or `light` as default manually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405236561


##########
python/docs/source/_templates/autosummary/class_with_docs.rst:
##########
@@ -47,7 +47,9 @@
 
     .. autosummary::
     {% for item in attributes %}
-       ~{{ name }}.{{ item }}
+        {% if not (item == 'uid') %}

Review Comment:
   We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property which is not correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst

Review Comment:
   For example, previously the rst file was created as follows:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql.SparkSession
   
   .. automethod:: builder.appName
   ```
   However, in newer Sphinx versions it is generated like this:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql
   
   .. automethod:: SparkSession.builder.appName
   ```
   
   In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail.
   
   See also https://github.com/sphinx-doc/sphinx/issues/7551.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405234673


##########
python/docs/source/conf.py:
##########
@@ -194,7 +194,11 @@
 # further.  For a list of options available for each theme, see the
 # documentation.
 html_theme_options = {
-    "navbar_end": ["version-switcher"]
+    "navbar_end": ["version-switcher", "theme-switcher"],
+    "logo": {
+        "image_light": "_static/spark-logo-light.png",
+        "image_dark": "_static/spark-logo-dark.png",

Review Comment:
   FYI: The default mode for light/dark is `auto`, which will choose a theme based on the system settings from user, but we can specify one of `dark` or `light` as default manually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on PR #44012:
URL: https://github.com/apache/spark/pull/44012#issuecomment-1826501628

   Documentation build passed: https://github.com/itholic/spark/actions/runs/6991575840/job/19023836477


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #44012: [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation
URL: https://github.com/apache/spark/pull/44012


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405234673


##########
python/docs/source/conf.py:
##########
@@ -194,7 +194,11 @@
 # further.  For a list of options available for each theme, see the
 # documentation.
 html_theme_options = {
-    "navbar_end": ["version-switcher"]
+    "navbar_end": ["version-switcher", "theme-switcher"],
+    "logo": {
+        "image_light": "_static/spark-logo-light.png",
+        "image_dark": "_static/spark-logo-dark.png",

Review Comment:
   FYI: The default mode for light/dark is `auto`, which will choose a theme based on the system settings from user, but we can specify one of `dark` or `light` as default manually if we want.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst

Review Comment:
   For example, previously the rst file was created as follows:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql.SparkSession
   
   .. automethod:: builder.appName
   ```
   However, in newer Sphinx versions it is generated like this:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql
   
   .. automethod:: SparkSession.builder.appName
   ```
   
   In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. See https://github.com/sphinx-doc/sphinx/issues/7551 related issue. Therefore, we should use the customized template such as Pandas does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst

Review Comment:
   For example, previously the rst file was created as follows:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql.SparkSession
   
   .. automethod:: builder.appName
   ```
   However, in newer Sphinx versions it is generated like this:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql
   
   .. automethod:: SparkSession.builder.appName
   ```
   
   In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail.
   
   Therefore, I modified the path to be customized for these cases using a custom template such as Pandas does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405240579


##########
dev/requirements.txt:
##########
@@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54
 mkdocs
 
 # Documentation (Python)
-pydata_sphinx_theme
+pydata_sphinx_theme==0.13

Review Comment:
   I followed [the version used in Pandas](https://github.com/pandas-dev/pandas/blob/main/requirements-dev.txt#L64), and actually I believe this version render the document in the most optimal form.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405236561


##########
python/docs/source/_templates/autosummary/class_with_docs.rst:
##########
@@ -47,7 +47,9 @@
 
     .. autosummary::
     {% for item in attributes %}
-       ~{{ name }}.{{ item }}
+        {% if not (item == 'uid') %}

Review Comment:
   We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly.



##########
python/docs/source/_templates/autosummary/class_with_docs.rst:
##########
@@ -47,7 +47,9 @@
 
     .. autosummary::
     {% for item in attributes %}
-       ~{{ name }}.{{ item }}
+        {% if not (item == 'uid') %}

Review Comment:
   We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly.



##########
python/docs/source/_templates/autosummary/class_with_docs.rst:
##########
@@ -47,7 +47,9 @@
 
     .. autosummary::
     {% for item in attributes %}
-       ~{{ name }}.{{ item }}
+        {% if not (item == 'uid') %}

Review Comment:
   We should manually exclude uid from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405240579


##########
dev/requirements.txt:
##########
@@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54
 mkdocs
 
 # Documentation (Python)
-pydata_sphinx_theme
+pydata_sphinx_theme==0.13

Review Comment:
   I followed [the version used in Pandas](https://github.com/pandas-dev/pandas/blob/main/requirements-dev.txt#L64), and actually I believe this version render the document in the most optimal form after doing several version testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405240335


##########
dev/requirements.txt:
##########
@@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54
 mkdocs
 
 # Documentation (Python)
-pydata_sphinx_theme
+pydata_sphinx_theme==0.13

Review Comment:
   I followed [the version used in Pandas](https://github.com/pandas-dev/pandas/blob/main/requirements-dev.txt#L64), and actually tested several versions, rendering the document in the most attractive form.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405231062


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst

Review Comment:
   With the new version of Sphinx, the package name creation rules for `rst` files that are automatically created when building documents have changed, so we must manually adjust the package path using these templates.
   
   This behavior is [used in the same way in Pandas](https://github.com/pandas-dev/pandas/tree/main/doc/_templates/autosummary), so I referred to it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405239729


##########
dev/requirements.txt:
##########
@@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54
 mkdocs
 
 # Documentation (Python)
-pydata_sphinx_theme
+pydata_sphinx_theme==0.13
 ipython
 nbsphinx
 numpydoc
-jinja2<3.0.0
-sphinx<3.1.0
+jinja2
+sphinx==4.2.0

Review Comment:
   I see that other Sphinx versions do not generate documentation properly for some reason. I have tested as many combinations as possible with Jinja2 and pydata_sphinx_theme, but I have confirmed that Sphinx version 4.2.0 currently renders documents in the most optimal form. Will investigate further in the future to support the latest Sphinx If necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44012:
URL: https://github.com/apache/spark/pull/44012#issuecomment-1826942796

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst

Review Comment:
   For example, previously the rst file was created as follows:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql.SparkSession
   
   .. automethod:: builder.appName
   ```
   However, in newer Sphinx versions it is generated like this:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql
   
   .. automethod:: SparkSession.builder.appName
   ```
   
   In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. See https://github.com/sphinx-doc/sphinx/issues/7551 related issue.
   
   Therefore, I modified the path to be customized for these cases using a custom template such as Pandas does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst

Review Comment:
   For example, previously the rst file was created as follows:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql.SparkSession
   
   .. automethod:: builder.appName
   ```
   However, in newer Sphinx versions it is generated like this:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql
   
   .. automethod:: SparkSession.builder.appName
   ```
   
   In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail.
   
   See also https://github.com/sphinx-doc/sphinx/issues/7551.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405238199


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -319,8 +320,8 @@ specific plotting methods of the form ``DataFrame.plot.<kind>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst
 
-   DataFrame.plot

Review Comment:
   In newer versions of Sphinx, the build will fail because `DataFrame.plot` and `Series.plot` are determined to be duplicates of the list of functions described below such as `DataFrame.plot.area`, `DataFrame.plot.barh`, `DataFrame.plot.bar`, etc.
   
   In fact, this behavior seems reasonable since `.plot` is simply an accessor keyword and not a function, so I believe we can just simply leave it out of the document.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405233405


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst

Review Comment:
   For example, previously the rst file was created as follows:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql.SparkSession
   
   .. automethod:: builder.appName
   ```
   However, in newer Sphinx versions it is generated like this:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql
   
   .. automethod:: SparkSession.builder.appName
   ```
   
   In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail. That's why we should use the customized template to correct the module path.
   
   See also https://github.com/sphinx-doc/sphinx/issues/7551.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405232095


##########
python/docs/source/reference/pyspark.pandas/frame.rst:
##########
@@ -299,6 +299,7 @@ in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
 
 .. autosummary::
    :toctree: api/
+   :template: autosummary/accessor_method.rst

Review Comment:
   For example, previously the rst file was created as follows:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql.SparkSession
   
   .. automethod:: builder.appName
   ```
   However, in newer Sphinx versions it is generated like this:
   
   ```rst
   pyspark.sql.SparkSession.builder.appName
   ========================================
   
   .. currentmodule:: pyspark.sql
   
   .. automethod:: SparkSession.builder.appName
   ```
   
   In the case of functions used through internal classes or accessors like this, the package paths created in a new way will cause Sphinx build to fail.
   
   Therefore, I modified the path to be customized for these cases using a custom template such as Pandas does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405239729


##########
dev/requirements.txt:
##########
@@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54
 mkdocs
 
 # Documentation (Python)
-pydata_sphinx_theme
+pydata_sphinx_theme==0.13
 ipython
 nbsphinx
 numpydoc
-jinja2<3.0.0
-sphinx<3.1.0
+jinja2
+sphinx==4.2.0

Review Comment:
   I tested that other Sphinx versions do not generate documentation properly for some reason. I have tested as many combinations as possible with Jinja2 and pydata_sphinx_theme, but I have confirmed that Sphinx version 4.2.0 currently renders documents in the most optimal form. Will investigate further in the future to support the latest Sphinx If necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405236858


##########
python/docs/source/_templates/autosummary/class_with_docs.rst:
##########
@@ -47,7 +47,9 @@
 
     .. autosummary::
     {% for item in attributes %}
-       ~{{ name }}.{{ item }}
+        {% if not (item == 'uid') %}

Review Comment:
   We should manually exclude `uid` from documentation because it is an internal property. We don't include them our current documentation as well, but for some reason newer Sphinx version trying to generate the internal property unexpectedly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "itholic (via GitHub)" <gi...@apache.org>.
itholic commented on code in PR #44012:
URL: https://github.com/apache/spark/pull/44012#discussion_r1405240335


##########
dev/requirements.txt:
##########
@@ -31,12 +31,12 @@ pandas-stubs<1.2.0.54
 mkdocs
 
 # Documentation (Python)
-pydata_sphinx_theme
+pydata_sphinx_theme==0.13

Review Comment:
   I followed [the version used in Pandas](https://github.com/pandas-dev/pandas/blob/main/requirements-dev.txt#L64), and actually tested several versions, rendering the document in the most attractive form.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46103][PYTHON][INFRA][BUILD][DOCS] Enhancing PySpark documentation [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44012:
URL: https://github.com/apache/spark/pull/44012#issuecomment-1826463424

   This is a nice improvement!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org