You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/11 02:13:57 UTC

[GitHub] [arrow] krcrouse commented on pull request #13126: ARROW-12526: Pre-generating pyarrow.compute and creating a docstring additions system for pyarrow functions

krcrouse commented on PR #13126:
URL: https://github.com/apache/arrow/pull/13126#issuecomment-1179888233

   @jorisvandenbossche, I'll wait for some more guidance from you all and just respond to your comments inline for the moment. 
   
   > * I think we will probably want to commit the generated code to the repo (and in any case, doing that for now might also make it easier to review (although you already linked the file above as well)).
   
   That makes sense and probably helps in the overall structure.
   
   > * Can you explain a bit more the need for `pyarrow.docutils`? 
   
   docutils is used to process the reStructured text appendices so that they can be merged with the autogenerated docs. In this model, I propose using reSt for the function documentation additions because it's established, it's testable, and contributors will at least understand what it is even if they're not proficient in it.
   
   If your question is more about "the need for it as a required module" -  If we include the generated files then the `docutils` requirement only needs to be a build dependency in order to generate the actual code files. It does not need to be included as a requirement for the actual package for end users.
   
   > (we should also make it a private module, or put in eg python/scripts or so?)
   
   Agreed - it should be a private module.
   
   >   As far as I understand, by using the full ability of docutils, we can override _any_ section of the docstring ...
   >   Do we think we will really need that full ability? Or it might in almost all cases be sufficient to just append content? 
   
   I think we would want both options. Since the default documentation is pulling from the C++ library code, I think you could browse the current [generated documentation](https://arrow.apache.org/docs/python/api/compute.html) and see sections that are not useful and could be entirely overwritten. I also think the hybrid approach of creating default documentation with options to append and/or overwrite is best because it will pull in changes to the core C++ function interface automatically while preserving the manually provided improved pythonic documentation.
   
   Take, for example, the parameter definitions of [`replace_with_mask`](https://arrow.apache.org/docs/python/generated/pyarrow.compute.replace_with_mask.html#pyarrow.compute.replace_with_mask) - in the context of the still not terribly verbose description, the parameter types are incorrect and the explanation of each parameter is useless ('Argument to compute function."). I don't have the code in front of me at the moment, but having implemented this function in pyarrow 7.0, there are several oddities in how these parameters are handled that should be in the param descriptions and not merely as an appended paragraph at the bottom.  
   
   >   Small nit: this doesn't need to be a public module, so maybe something like `_compute_generated.py` instead?
   > * > Using raw reSt so that code block examples can be tested using doctest
   
   Agreed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org