You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/02 11:25:55 UTC

[GitHub] [arrow] pitrou commented on a change in pull request #11830: ARROW-13832: [Doc] Improve compute documentation

pitrou commented on a change in pull request #11830:
URL: https://github.com/apache/arrow/pull/11830#discussion_r760996141



##########
File path: docs/source/python/api/compute.rst
##########
@@ -498,3 +498,50 @@ Structural Transforms
    make_struct
    replace_with_mask
    struct_field
+
+Compute Options
+---------------
+
+.. autosummary::
+   :toctree: ../generated/
+
+   ScalarAggregateOptions
+   CountOptions
+   TDigestOptions

Review comment:
       Put all these in alphabetical order?

##########
File path: docs/source/conf.py
##########
@@ -463,3 +464,40 @@ def setup(app):
     # This will also rebuild appropriately when the value changes.
     app.add_config_value('cuda_enabled', cuda_enabled, 'env')
     app.add_config_value('flight_enabled', flight_enabled, 'env')
+    app.add_directive('computefuncs', ComputeFunctionsTableDirective)
+
+
+class ComputeFunctionsTableDirective(Directive):
+    has_content = True
+    option_spec = {
+        "kind": directives.unchanged
+    }
+
+    def run(self):
+        from docutils.statemachine import ViewList
+        from docutils import nodes
+        import pyarrow._compute

Review comment:
       Is there a reason for importing this instead of the public `pyarrow.compute`?

##########
File path: docs/source/conf.py
##########
@@ -463,3 +464,40 @@ def setup(app):
     # This will also rebuild appropriately when the value changes.
     app.add_config_value('cuda_enabled', cuda_enabled, 'env')
     app.add_config_value('flight_enabled', flight_enabled, 'env')
+    app.add_directive('computefuncs', ComputeFunctionsTableDirective)

Review comment:
       For disambiguation and clarity, can we prefix our own directives with "arrow-"?

##########
File path: docs/source/conf.py
##########
@@ -463,3 +464,40 @@ def setup(app):
     # This will also rebuild appropriately when the value changes.
     app.add_config_value('cuda_enabled', cuda_enabled, 'env')
     app.add_config_value('flight_enabled', flight_enabled, 'env')
+    app.add_directive('computefuncs', ComputeFunctionsTableDirective)
+
+
+class ComputeFunctionsTableDirective(Directive):

Review comment:
       Can you add a docstring briefly explaining what this does?

##########
File path: docs/source/conf.py
##########
@@ -463,3 +464,40 @@ def setup(app):
     # This will also rebuild appropriately when the value changes.
     app.add_config_value('cuda_enabled', cuda_enabled, 'env')
     app.add_config_value('flight_enabled', flight_enabled, 'env')
+    app.add_directive('computefuncs', ComputeFunctionsTableDirective)
+
+
+class ComputeFunctionsTableDirective(Directive):
+    has_content = True
+    option_spec = {
+        "kind": directives.unchanged
+    }
+
+    def run(self):
+        from docutils.statemachine import ViewList
+        from docutils import nodes
+        import pyarrow._compute
+
+        result = ViewList()
+        function_kind = self.options.get('kind', None)
+
+        result.append(".. csv-table::", "<computefuncs>")
+        result.append("   :widths: 20, 60, 20", "<computefuncs>")
+        result.append("   ", "<computefuncs>")
+        funcs_reg = pyarrow._compute.function_registry()
+        for fname in funcs_reg.list_functions():
+            f = funcs_reg.get_function(fname)
+            option_class = ""
+            if f._doc.options_class:
+                option_class = ":class:`{}`".format(
+                    f._doc.options_class
+                )
+            if not function_kind or f.kind == function_kind:
+                result.append('   "{}", "{}", "{}"'.format(

Review comment:
       Nit, but we can use f-strings now :-)

##########
File path: docs/source/python/compute.rst
##########
@@ -62,8 +77,89 @@ Here is an example of sorting a table:
       0
     ]
 
-
+For a complete list of the compute functions that PyArrow provides
+you can refer to :ref:`api.compute` reference.
 
 .. seealso::
 
    :ref:`Available compute functions (C++ documentation) <compute-function-list>`.
+
+Grouped Aggregations
+====================
+
+PyArrow supports grouped aggregations over :class:`pyarrow.Table` through the
+:meth:`pyarrow.Table.group_by` method. 
+The method will return a grouping declaration
+to which the hash aggregation functions can be applied::
+
+   >>> import pyarrow as pa
+   >>> t = pa.table([
+   ...       pa.array(["a", "a", "b", "b", "c"]),
+   ...       pa.array([1, 2, 3, 4, 5]),
+   ... ], names=["keys", "values"])
+   >>> t.group_by("keys").aggregate([("values", "sum")])
+   pyarrow.Table
+   values_sum: int64
+   keys: string
+   ----
+   values_sum: [[3,7,5]]
+   keys: [["a","b","c"]]
+
+The ``"sum"`` aggregation passed to the ``aggregate`` method in the previous
+example is the :func:`hash_sum` compute function. 

Review comment:
       Is "hash_sum" actually exposed in the docs? Otherwise, I suppose the `:func:` tag will not link to anything?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org