You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/03/28 13:09:45 UTC

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34759: GH-34579: [Python][Docs] TableGroupBy.aggregate options

jorisvandenbossche commented on code in PR #34759:
URL: https://github.com/apache/arrow/pull/34759#discussion_r1150583383


##########
python/pyarrow/table.pxi:
##########
@@ -5515,6 +5515,9 @@ list[tuple(str, str, FunctionOptions)]
             column names, for unary, nullary and n-ary aggregation functions
             respectively.
 
+            For the list of function names and respective aggregation
+            function options see: :ref:`py-grouped-aggrs`.

Review Comment:
   ```suggestion
               function options see :ref:`py-grouped-aggrs`.
   ```



##########
python/pyarrow/table.pxi:
##########
@@ -5527,20 +5530,58 @@ list[tuple(str, str, FunctionOptions)]
         ...       pa.array(["a", "a", "b", "b", "c"]),
         ...       pa.array([1, 2, 3, 4, 5]),
         ... ], names=["keys", "values"])
+
+        Sum the column "values" over the grouped column "keys":
+
         >>> t.group_by("keys").aggregate([("values", "sum")])
         pyarrow.Table
         values_sum: int64
         keys: string
         ----
         values_sum: [[3,7,5]]
         keys: [["a","b","c"]]
+
+        Count the rows over the grouped column "keys":
+
         >>> t.group_by("keys").aggregate([([], "count_all")])
         pyarrow.Table
         count_all: int64
         keys: string
         ----
         count_all: [[2,2,1]]
         keys: [["a","b","c"]]
+
+        Do multiple aggregations:
+
+        >>> t.group_by("keys").aggregate([
+        ...    ("values", "sum"),
+        ...    ("keys", "count")
+        ... ])
+        pyarrow.Table
+        values_sum: int64
+        keys_count: int64
+        keys: string
+        ----
+        values_sum: [[3,7,5]]
+        keys_count: [[2,2,1]]
+        keys: [["a","b","c"]]
+
+        Count the number of non-null values for column "values"
+        over the grouped column "keys":
+
+        >>> import pyarrow.compute as pc
+        >>> t.group_by(["keys"]).aggregate([
+        ...    ("values", "count", pc.CountOptions(mode="all"))

Review Comment:
   ```suggestion
           ...    ("values", "count", pc.CountOptions(mode="only_valid"))
   ```
   
   If you want the "number of non-null values" as mentioned above, you need this option (which is actually the default, but OK to show it explicitly I think)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org