You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/03/28 13:09:45 UTC
[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34759: GH-34579: [Python][Docs] TableGroupBy.aggregate options
jorisvandenbossche commented on code in PR #34759:
URL: https://github.com/apache/arrow/pull/34759#discussion_r1150583383
##########
python/pyarrow/table.pxi:
##########
@@ -5515,6 +5515,9 @@ list[tuple(str, str, FunctionOptions)]
column names, for unary, nullary and n-ary aggregation functions
respectively.
+ For the list of function names and respective aggregation
+ function options see: :ref:`py-grouped-aggrs`.
Review Comment:
```suggestion
function options see :ref:`py-grouped-aggrs`.
```
##########
python/pyarrow/table.pxi:
##########
@@ -5527,20 +5530,58 @@ list[tuple(str, str, FunctionOptions)]
... pa.array(["a", "a", "b", "b", "c"]),
... pa.array([1, 2, 3, 4, 5]),
... ], names=["keys", "values"])
+
+ Sum the column "values" over the grouped column "keys":
+
>>> t.group_by("keys").aggregate([("values", "sum")])
pyarrow.Table
values_sum: int64
keys: string
----
values_sum: [[3,7,5]]
keys: [["a","b","c"]]
+
+ Count the rows over the grouped column "keys":
+
>>> t.group_by("keys").aggregate([([], "count_all")])
pyarrow.Table
count_all: int64
keys: string
----
count_all: [[2,2,1]]
keys: [["a","b","c"]]
+
+ Do multiple aggregations:
+
+ >>> t.group_by("keys").aggregate([
+ ... ("values", "sum"),
+ ... ("keys", "count")
+ ... ])
+ pyarrow.Table
+ values_sum: int64
+ keys_count: int64
+ keys: string
+ ----
+ values_sum: [[3,7,5]]
+ keys_count: [[2,2,1]]
+ keys: [["a","b","c"]]
+
+ Count the number of non-null values for column "values"
+ over the grouped column "keys":
+
+ >>> import pyarrow.compute as pc
+ >>> t.group_by(["keys"]).aggregate([
+ ... ("values", "count", pc.CountOptions(mode="all"))
Review Comment:
```suggestion
... ("values", "count", pc.CountOptions(mode="only_valid"))
```
If you want the "number of non-null values" as mentioned above, you need this option (which is actually the default, but OK to show it explicitly I think)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org