You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "danepitkin (via GitHub)" <gi...@apache.org> on 2023/04/03 20:18:04 UTC

[GitHub] [arrow] danepitkin opened a new issue, #34868: [Python] Sharing docstrings between classes

danepitkin opened a new issue, #34868:
URL: https://github.com/apache/arrow/issues/34868

   ### Describe the enhancement requested
   
   PyArrow duplicates a lot of documentation in order to provide explicit docstring examples. Let's reduce the duplication of docstrings by providing a way to share docstrings between classes. See the way `pandas` did this as an example: https://pandas.pydata.org/docs/development/contributing_docstring.html#sharing-docstrings
   
   A good example of duplication in PyArrow are the classes `Table` and `RecordBatch`. They both provide similar, sometimes identical, top-level implementations and docstrings, while typically only differing in low-level C++ implementation.
   
   Here is an example of duplicative docstring descriptions.
   `class RecordBatch:`
   ```
       @property
       def nbytes(self):
           """
           Total number of bytes consumed by the elements of the record batch.
   
           In other words, the sum of bytes from all buffer ranges referenced.
   
           Unlike `get_total_buffer_size` this method will account for array
           offsets.
   
           If buffers are shared between arrays then the shared
           portion will only be counted multiple times.
   
           The dictionary of dictionary arrays will always be counted in their
           entirety even if the array only references a portion of the dictionary.
   
           Examples
           --------
           >>> import pyarrow as pa
           >>> n_legs = pa.array([2, 2, 4, 4, 5, 100])
           >>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"])
           >>> batch = pa.RecordBatch.from_arrays([n_legs, animals],
           ...                                     names=["n_legs", "animals"])
           >>> batch.nbytes
           116
           """
           ...
   ```
   
   `class Table:`
   ```
       @property
       def nbytes(self):
           """
           Total number of bytes consumed by the elements of the table.
   
           In other words, the sum of bytes from all buffer ranges referenced.
   
           Unlike `get_total_buffer_size` this method will account for array
           offsets.
   
           If buffers are shared between arrays then the shared
           portion will only be counted multiple times.
   
           The dictionary of dictionary arrays will always be counted in their
           entirety even if the array only references a portion of the dictionary.
   
           Examples
           --------
           >>> import pyarrow as pa
           >>> import pandas as pd
           >>> df = pd.DataFrame({'n_legs': [None, 4, 5, None],
           ...                    'animals': ["Flamingo", "Horse", None, "Centipede"]})
           >>> table = pa.Table.from_pandas(df)
           >>> table.nbytes
           72
           """
           ...
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] danepitkin commented on issue #34868: [Python] Sharing docstrings between classes

Posted by "danepitkin (via GitHub)" <gi...@apache.org>.
danepitkin commented on issue #34868:
URL: https://github.com/apache/arrow/issues/34868#issuecomment-1496105553

   This won't work for Cython until this issue is fixed: https://github.com/python/cpython/issues/91309


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF closed issue #34868: [Python] Sharing docstrings between classes

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF closed issue #34868: [Python] Sharing docstrings between classes
URL: https://github.com/apache/arrow/issues/34868


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org