You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/02/14 13:32:36 UTC

[GitHub] [arrow] jorisvandenbossche commented on pull request #34099: GH-34098: [Python][Docs] Fix dataset docstring

jorisvandenbossche commented on PR #34099:
URL: https://github.com/apache/arrow/pull/34099#issuecomment-1429755044

   @Fokko Thanks for the PR! Generally it looks great, but in practice it seems the `.format(..)` might not work for docstrings? If I test out this branch, I get empty docstrings for eg `pyarrow.dataset.Dataset.to_batches`
   
   In pandas, they use a `@doc` decorator that does this, I assume to overcome this limitation (https://github.com/pandas-dev/pandas/blob/7d545f0849b8502974d119684bef744382cb55be/pandas/util/_decorators.py#L340-L398, it can do a lot more than just filling in some parts, so it's more complicated than what we would need). But I don't know if this would work in cython code. This would be used like
   
   ```
   class Scanner:
   
       @doc(_scanner_arguments_doc)
       def to_batches(..):
           """
           Read the dataset as materialized record batches.
   
           Parameters
           ----------
           {0}
           """
   ```
   
   Instead of calling the `.format` inline (and the `@doc` decorator basically does that under the hood)
   
   
   In our own parquet module we use the simpler approach of afterwards assigning `__doc__`:
   
   https://github.com/apache/arrow/blob/ddfa8eed9b188fcc7b38767d1858c2588c588f05/python/pyarrow/parquet/core.py#L3019-L3029
   
   Similarly like that, you could also leave the docstrings as you updated them in this PR, but do the string interpolation as a next step:
   
   ```
   class Scanner:
   
       @doc(_scanner_arguments_doc)
       def to_batches(..):
           """
           Read the dataset as materialized record batches.
   
           Parameters
           ----------
           {0}
           """
       ...
   
   Scanner.to_batches.__doc__ = Scanner.to_batches.__doc__.format(_scanner_arguments_doc)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org