You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "AlenkaF (via GitHub)" <gi...@apache.org> on 2023/03/28 11:35:48 UTC

[GitHub] [arrow] AlenkaF opened a new pull request, #34759: GH-34579: [Python][Docs] TableGroupBy.aggregate options

AlenkaF opened a new pull request, #34759:
URL: https://github.com/apache/arrow/pull/34759

   ### Rationale for this change
   
   Add more information and examples to `pa.TableGroupBy.aggregate` method to make it clearer to use.
   
   ### What changes are included in this PR?
   
   Changes in the `pa.TableGroupBy.aggregate` docstrings include:
   - link to https://arrow.apache.org/docs/python/compute.html#grouped-aggregations
   - extra examples


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche merged pull request #34759: GH-34579: [Python][Docs] TableGroupBy.aggregate options

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche merged PR #34759:
URL: https://github.com/apache/arrow/pull/34759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34759: GH-34579: [Python][Docs] TableGroupBy.aggregate options

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34759:
URL: https://github.com/apache/arrow/pull/34759#issuecomment-1488120128

   Revision: 4b3161b4c0d3539c9ef01859d737393f4e430676
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-2d3b2209d1](https://github.com/ursacomputing/crossbow/branches/all?query=actions-2d3b2209d1)
   
   |Task|Status|
   |----|------|
   |preview-docs|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-2d3b2209d1-github-preview-docs)](https://github.com/ursacomputing/crossbow/actions/runs/4551630836/jobs/8025935557)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #34759: GH-34579: [Python][Docs] TableGroupBy.aggregate options

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #34759:
URL: https://github.com/apache/arrow/pull/34759#issuecomment-1488353836

   Generated docs for the change in this PR can be viewed here: http://crossbow.voltrondata.com/pr_docs/34759/python/generated/pyarrow.TableGroupBy.html#pyarrow.TableGroupBy.aggregate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #34759: GH-34579: [Python][Docs] TableGroupBy.aggregate options

Posted by "AlenkaF (via GitHub)" <gi...@apache.org>.
AlenkaF commented on PR #34759:
URL: https://github.com/apache/arrow/pull/34759#issuecomment-1488115883

   @github-actions crossbow submit preview-docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34759: GH-34579: [Python][Docs] TableGroupBy.aggregate options

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34759:
URL: https://github.com/apache/arrow/pull/34759#issuecomment-1489037296

   Benchmark runs are scheduled for baseline = 2c9340dd1795bedcc6a43975e26580b4f27912ad and contender = 24ba1cf75c01040b279514ecc063975de272b766. 24ba1cf75c01040b279514ecc063975de272b766 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7de9f55bb95e404c9bb07ec95edf6648...12cc70195283455b85ee6dfee92d555e/)
   [Failed :arrow_down:0.8% :arrow_up:1.57%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/ac5d5deb91404233be108cf347dd9ea0...73c07ce7c0554d13a81515856991737b/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/abe49752e46c474383c1bbe4338cbd0e...c6815716b5e342968f63d3255dfdf216/)
   [Finished :arrow_down:0.25% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/edc63e1a607e45b8af0f8cff72319eba...e6a4a34e7e6349019c4153a559ab5b27/)
   Buildkite builds:
   [Finished] [`24ba1cf7` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2592)
   [Failed] [`24ba1cf7` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2622)
   [Finished] [`24ba1cf7` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2590)
   [Finished] [`24ba1cf7` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2613)
   [Finished] [`2c9340dd` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2591)
   [Finished] [`2c9340dd` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2621)
   [Finished] [`2c9340dd` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2589)
   [Finished] [`2c9340dd` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2612)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #34759: GH-34579: [Python][Docs] TableGroupBy.aggregate options

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #34759:
URL: https://github.com/apache/arrow/pull/34759#discussion_r1150583383


##########
python/pyarrow/table.pxi:
##########
@@ -5515,6 +5515,9 @@ list[tuple(str, str, FunctionOptions)]
             column names, for unary, nullary and n-ary aggregation functions
             respectively.
 
+            For the list of function names and respective aggregation
+            function options see: :ref:`py-grouped-aggrs`.

Review Comment:
   ```suggestion
               function options see :ref:`py-grouped-aggrs`.
   ```



##########
python/pyarrow/table.pxi:
##########
@@ -5527,20 +5530,58 @@ list[tuple(str, str, FunctionOptions)]
         ...       pa.array(["a", "a", "b", "b", "c"]),
         ...       pa.array([1, 2, 3, 4, 5]),
         ... ], names=["keys", "values"])
+
+        Sum the column "values" over the grouped column "keys":
+
         >>> t.group_by("keys").aggregate([("values", "sum")])
         pyarrow.Table
         values_sum: int64
         keys: string
         ----
         values_sum: [[3,7,5]]
         keys: [["a","b","c"]]
+
+        Count the rows over the grouped column "keys":
+
         >>> t.group_by("keys").aggregate([([], "count_all")])
         pyarrow.Table
         count_all: int64
         keys: string
         ----
         count_all: [[2,2,1]]
         keys: [["a","b","c"]]
+
+        Do multiple aggregations:
+
+        >>> t.group_by("keys").aggregate([
+        ...    ("values", "sum"),
+        ...    ("keys", "count")
+        ... ])
+        pyarrow.Table
+        values_sum: int64
+        keys_count: int64
+        keys: string
+        ----
+        values_sum: [[3,7,5]]
+        keys_count: [[2,2,1]]
+        keys: [["a","b","c"]]
+
+        Count the number of non-null values for column "values"
+        over the grouped column "keys":
+
+        >>> import pyarrow.compute as pc
+        >>> t.group_by(["keys"]).aggregate([
+        ...    ("values", "count", pc.CountOptions(mode="all"))

Review Comment:
   ```suggestion
           ...    ("values", "count", pc.CountOptions(mode="only_valid"))
   ```
   
   If you want the "number of non-null values" as mentioned above, you need this option (which is actually the default, but OK to show it explicitly I think)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org