You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/06/05 08:37:48 UTC

[GitHub] [arrow] AlenkaF opened a new pull request, #35907: Add a change to conf.py to build docs without pyarrow installed

AlenkaF opened a new pull request, #35907:
URL: https://github.com/apache/arrow/pull/35907

   ### Rationale for this change
   
   Ease the process of building the documentation for dev purposes.
   
   ### What changes are included in this PR?
   
   `conf.py` is updated in a way to permit having pyarrow not installed (from source or as a binary).
   In case pyarrow is not available:
   - `docs/source/python` folder will be excluded from the build of the documentation
   -  version of the documentation will be set to `'0.0.0-local-docs-build'`
   
   I have tested the changes for cases when:
   - pyarrow was built from source
   - without pyarrow
   - pyarrow was installed from PyPI, version 12.0.0
   
   with building all of the docs and only format/developers sections.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #35907: GH-35906: [Docs] Enable the build of the documentation without pyarrow built from source

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #35907:
URL: https://github.com/apache/arrow/pull/35907#discussion_r1219574207


##########
docs/source/conf.py:
##########
@@ -38,15 +38,63 @@
 from unittest import mock
 from docutils.parsers.rst import Directive, directives
 
-import pyarrow
-
-
 sys.path.extend([
     os.path.join(os.path.dirname(__file__),
                  '..', '../..')
 
 ])
 
+# -- Customization --------------------------------------------------------
+
+try:
+    import pyarrow
+    exclude_patterns = []
+
+    # Conditional API doc generation
+
+    # Sphinx has two features for conditional inclusion:
+    # - The "only" directive
+    #   https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#including-content-based-on-tags
+    # - The "ifconfig" extension
+    #   https://www.sphinx-doc.org/en/master/usage/extensions/ifconfig.html
+    #
+    # Both have issues, but "ifconfig" seems to work in this setting.
+
+    try:
+        import pyarrow.cuda
+        cuda_enabled = True
+    except ImportError:
+        cuda_enabled = False
+        # Mock pyarrow.cuda to avoid autodoc warnings.
+        # XXX I can't get autodoc_mock_imports to work, so mock manually instead
+        # (https://github.com/sphinx-doc/sphinx/issues/2174#issuecomment-453177550)
+        pyarrow.cuda = sys.modules['pyarrow.cuda'] = mock.Mock()
+
+    try:
+        import pyarrow.flight
+        flight_enabled = True
+    except ImportError:
+        flight_enabled = False
+        pyarrow.flight = sys.modules['pyarrow.flight'] = mock.Mock()
+
+    try:
+        import pyarrow.orc
+        orc_enabled = True
+    except ImportError:
+        orc_enabled = False
+        pyarrow.orc = sys.modules['pyarrow.orc'] = mock.Mock()
+
+    try:
+        import pyarrow.parquet.encryption
+        parquet_encryption_enabled = True
+    except ImportError:
+        parquet_encryption_enabled = False
+        pyarrow.parquet.encryption = sys.modules['pyarrow.parquet.encryption'] = mock.Mock()
+except:

Review Comment:
   That might be specific to you local install (it's still finding the editable dev version?), so best to also include ImportError (if you are in an env without any pyarrow installed, it should give an ImportError)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35907: Add a change to conf.py to build docs without pyarrow installed

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35907:
URL: https://github.com/apache/arrow/pull/35907#issuecomment-1576368367

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #35907: GH-35906: [Docs] Enable the build of the documentation without pyarrow built from source

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #35907:
URL: https://github.com/apache/arrow/pull/35907#discussion_r1219367424


##########
docs/source/conf.py:
##########
@@ -38,15 +38,63 @@
 from unittest import mock
 from docutils.parsers.rst import Directive, directives
 
-import pyarrow
-
-
 sys.path.extend([
     os.path.join(os.path.dirname(__file__),
                  '..', '../..')
 
 ])
 
+# -- Customization --------------------------------------------------------
+
+try:
+    import pyarrow
+    exclude_patterns = []
+
+    # Conditional API doc generation
+
+    # Sphinx has two features for conditional inclusion:
+    # - The "only" directive
+    #   https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#including-content-based-on-tags
+    # - The "ifconfig" extension
+    #   https://www.sphinx-doc.org/en/master/usage/extensions/ifconfig.html
+    #
+    # Both have issues, but "ifconfig" seems to work in this setting.
+
+    try:
+        import pyarrow.cuda
+        cuda_enabled = True
+    except ImportError:
+        cuda_enabled = False
+        # Mock pyarrow.cuda to avoid autodoc warnings.
+        # XXX I can't get autodoc_mock_imports to work, so mock manually instead
+        # (https://github.com/sphinx-doc/sphinx/issues/2174#issuecomment-453177550)
+        pyarrow.cuda = sys.modules['pyarrow.cuda'] = mock.Mock()
+
+    try:
+        import pyarrow.flight
+        flight_enabled = True
+    except ImportError:
+        flight_enabled = False
+        pyarrow.flight = sys.modules['pyarrow.flight'] = mock.Mock()
+
+    try:
+        import pyarrow.orc
+        orc_enabled = True
+    except ImportError:
+        orc_enabled = False
+        pyarrow.orc = sys.modules['pyarrow.orc'] = mock.Mock()
+
+    try:
+        import pyarrow.parquet.encryption
+        parquet_encryption_enabled = True
+    except ImportError:
+        parquet_encryption_enabled = False
+        pyarrow.parquet.encryption = sys.modules['pyarrow.parquet.encryption'] = mock.Mock()
+except:

Review Comment:
   ```suggestion
   except ImportError:
   ```
   
   ?



##########
docs/source/conf.py:
##########
@@ -158,11 +206,18 @@
 # built documents.
 #
 # The short X.Y version.
-version = os.environ.get('ARROW_DOCS_VERSION',
-                         pyarrow.__version__)
+try:
+    version = os.environ.get('ARROW_DOCS_VERSION',
+                             pyarrow.__version__)

Review Comment:
   I think you could try to avoid passing `pyarrow.__version__` to this function, as that will fail if pyarrow wasn't imported, even when ARROW_DOCS_VERSION is specified. 
   
   So that we allow you to set the version with that env variable, also for the case pyarrow isn't installed? (I don't know how important this use case is, though)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #35907: GH-35906: [Docs] Enable building the documentation without having pyarrow installed

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #35907:
URL: https://github.com/apache/arrow/pull/35907#issuecomment-1592865430

   This PR should be ready now @jorisvandenbossche 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #35907: GH-35906: [Docs] Enable the build of the documentation without pyarrow built from source

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #35907:
URL: https://github.com/apache/arrow/pull/35907#discussion_r1219533955


##########
docs/source/conf.py:
##########
@@ -158,11 +206,18 @@
 # built documents.
 #
 # The short X.Y version.
-version = os.environ.get('ARROW_DOCS_VERSION',
-                         pyarrow.__version__)
+try:
+    version = os.environ.get('ARROW_DOCS_VERSION',
+                             pyarrow.__version__)

Review Comment:
   That is much nicer in my opinion, yes 👍  Will try out the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #35907: GH-35906: [Docs] Enable the build of the documentation without pyarrow built from source

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #35907:
URL: https://github.com/apache/arrow/pull/35907#discussion_r1219532643


##########
docs/source/conf.py:
##########
@@ -38,15 +38,63 @@
 from unittest import mock
 from docutils.parsers.rst import Directive, directives
 
-import pyarrow
-
-
 sys.path.extend([
     os.path.join(os.path.dirname(__file__),
                  '..', '../..')
 
 ])
 
+# -- Customization --------------------------------------------------------
+
+try:
+    import pyarrow
+    exclude_patterns = []
+
+    # Conditional API doc generation
+
+    # Sphinx has two features for conditional inclusion:
+    # - The "only" directive
+    #   https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#including-content-based-on-tags
+    # - The "ifconfig" extension
+    #   https://www.sphinx-doc.org/en/master/usage/extensions/ifconfig.html
+    #
+    # Both have issues, but "ifconfig" seems to work in this setting.
+
+    try:
+        import pyarrow.cuda
+        cuda_enabled = True
+    except ImportError:
+        cuda_enabled = False
+        # Mock pyarrow.cuda to avoid autodoc warnings.
+        # XXX I can't get autodoc_mock_imports to work, so mock manually instead
+        # (https://github.com/sphinx-doc/sphinx/issues/2174#issuecomment-453177550)
+        pyarrow.cuda = sys.modules['pyarrow.cuda'] = mock.Mock()
+
+    try:
+        import pyarrow.flight
+        flight_enabled = True
+    except ImportError:
+        flight_enabled = False
+        pyarrow.flight = sys.modules['pyarrow.flight'] = mock.Mock()
+
+    try:
+        import pyarrow.orc
+        orc_enabled = True
+    except ImportError:
+        orc_enabled = False
+        pyarrow.orc = sys.modules['pyarrow.orc'] = mock.Mock()
+
+    try:
+        import pyarrow.parquet.encryption
+        parquet_encryption_enabled = True
+    except ImportError:
+        parquet_encryption_enabled = False
+        pyarrow.parquet.encryption = sys.modules['pyarrow.parquet.encryption'] = mock.Mock()
+except:

Review Comment:
   There is actually `LookupError` raised first:
   ```
   Configuration error:
   There is a programmable error in your configuration file:
   
   Traceback (most recent call last):
     File "/Users/alenkafrim/repos/arrow/python/pyarrow/__init__.py", line 40, in <module>
       from ._generated_version import version as __version__
   ModuleNotFoundError: No module named 'pyarrow._generated_version'
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/Users/alenkafrim/repos/pyarrow-dev/lib/python3.10/site-packages/sphinx/config.py", line 350, in eval_config_file
       exec(code, namespace)
     File "/Users/alenkafrim/repos/arrow/docs/source/conf.py", line 50, in <module>
       import pyarrow
     File "/Users/alenkafrim/repos/arrow/python/pyarrow/__init__.py", line 56, in <module>
       __version__ = setuptools_scm.get_version('../',
     File "/Users/alenkafrim/repos/pyarrow-dev/lib/python3.10/site-packages/setuptools_scm/__init__.py", line 148, in get_version
       _version_missing(config)
     File "/Users/alenkafrim/repos/pyarrow-dev/lib/python3.10/site-packages/setuptools_scm/__init__.py", line 108, in _version_missing
       raise LookupError(
   LookupError: setuptools-scm was unable to detect version for /Users/alenkafrim/repos/arrow/docs.
   
   Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.
   
   For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
   
   make: *** [html] Error 2
   ```
   
   Will do `except LookupError:`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on a diff in pull request #35907: GH-35906: [Docs] Enable the build of the documentation without pyarrow built from source

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on code in PR #35907:
URL: https://github.com/apache/arrow/pull/35907#discussion_r1219555107


##########
docs/source/conf.py:
##########
@@ -158,11 +206,18 @@
 # built documents.
 #
 # The short X.Y version.
-version = os.environ.get('ARROW_DOCS_VERSION',
-                         pyarrow.__version__)
+try:
+    version = os.environ.get('ARROW_DOCS_VERSION',
+                             pyarrow.__version__)

Review Comment:
   So yeah, we could default to an empty string if `ARROW_DOCS_VERSION` is not specified. I guess that can't happen (env var not being set) when publishing the docs with the release, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35907: GH-35906: [Docs] Enable the build of the documentation without pyarrow built from source

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35907:
URL: https://github.com/apache/arrow/pull/35907#issuecomment-1576370510

   :warning: GitHub issue #35906 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #35907: GH-35906: [Docs] Enable building the documentation without having pyarrow installed

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #35907:
URL: https://github.com/apache/arrow/pull/35907#discussion_r1231416154


##########
docs/source/conf.py:
##########
@@ -158,11 +206,18 @@
 # built documents.
 #
 # The short X.Y version.
-version = os.environ.get('ARROW_DOCS_VERSION',
-                         pyarrow.__version__)
+try:
+    version = os.environ.get('ARROW_DOCS_VERSION',
+                             pyarrow.__version__)

Review Comment:
   > So yeah, we could default to an empty string if `ARROW_DOCS_VERSION` is not specified.
   
   I think it would be nice to still use `pyarrow.__version__` _if_ it is installed, and the env variable is not set. It's nice to see a correct version number when building the docs locally for development (when you typically don't specify that env variable).
   
   You could for example use `pyarrow_version =  pyarrow.__version` and `pyarrow_version = ""` in the if/else branches above when trying to import pyarrow, and then use that variable here (`version = os.environ.get('ARROW_DOCS_VERSION', pyarrow_version)`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF merged pull request #35907: GH-35906: [Docs] Enable building the documentation without having pyarrow installed

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF merged PR #35907:
URL: https://github.com/apache/arrow/pull/35907


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #35907: GH-35906: [Docs] Enable the build of the documentation without pyarrow built from source

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #35907:
URL: https://github.com/apache/arrow/pull/35907#issuecomment-1591070848

   > I can try a bit further to see where I get, if you think this would be beneficial? But I do think if one is building the docs without even binary version of pyarrow then the need for python docs is not very high.
   
   Yes, it certainly is less important (given that you need some python environment anyway, given that we are using sphinx to build the docs). So let's not worry about that use case here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #35907: GH-35906: [Docs] Enable building the documentation without having pyarrow installed

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #35907:
URL: https://github.com/apache/arrow/pull/35907#issuecomment-1602985118

   Conbench analyzed the 6 benchmark runs on commit `65d603ae`.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14478250280) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #35907: GH-35906: [Docs] Enable the build of the documentation without pyarrow built from source

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #35907:
URL: https://github.com/apache/arrow/pull/35907#issuecomment-1578711123

   > One question: this is excluding the full of the python docs. Longer term, should we try to _only_ exclude the API docs? (there are currently a few files in the `python/` directory that require pyarrow to be installed to run the examples in the `ipython` directives, but many files also don't have this, and could in theory be built without pyarrow being installed)
   
   Excluding files with `ipython` directives ('python/data.rst', 'python/dataset.rst', 'python/getstarted.rst', 'python/ipc.rst', 'python/memory.rst', 'python/pandas.rst', 'python/parquet.rst') and 'python/filesystems_deprecated.rst' with the HDFS API autosummary list, I still get an error
   
   ```
   
   Sphinx parallel build error:
   ModuleNotFoundError: No module named 'pyarrow.lib'
   make: *** [html] Error 2
   ```
   
   and I think it is connected to all 33 `.rst` files that have `.. currentmodule:: pyarrow` (but am not sure). At this point I decided this doesn't make much sense and it is better just to exclude full of the python docs.
   
   I can try a bit further to see where I get, if you think this would be beneficial? But I do think if one is building the docs without even binary version of pyarrow then the need for python docs is not very high.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org