You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "thisisnic (via GitHub)" <gi...@apache.org> on 2023/06/11 16:49:07 UTC

[GitHub] [arrow-cookbook] thisisnic commented on a diff in pull request #311: Document window aggregates

thisisnic commented on code in PR #311:
URL: https://github.com/apache/arrow-cookbook/pull/311#discussion_r1225864495


##########
r/content/tables.Rmd:
##########
@@ -317,4 +317,52 @@ the development version of the Arrow R package had been associated with their op
 classes.  However, as the Arrow C++ library's functionality extends, compute 
 functions may be added which do not yet have an R binding.  If you find a C++ 
 compute function which you wish to use from the R package, please [open an issue
-on the project JIRA](https://issues.apache.org/jira/projects/ARROW/issues).
\ No newline at end of file
+on the project JIRA](https://issues.apache.org/jira/projects/ARROW/issues).
+
+## Compute Window Aggregates
+
+Arrow does not support window functions, which can be problematic when applying an aggregate like `mean()` on a grouped table or within a rowwise operation like `filter()`:

Review Comment:
   Please could you rephrase this to describe the task that the user is trying to achieve (e.g. "you want to...")



##########
r/content/tables.Rmd:
##########
@@ -317,4 +317,52 @@ the development version of the Arrow R package had been associated with their op
 classes.  However, as the Arrow C++ library's functionality extends, compute 
 functions may be added which do not yet have an R binding.  If you find a C++ 
 compute function which you wish to use from the R package, please [open an issue
-on the project JIRA](https://issues.apache.org/jira/projects/ARROW/issues).
\ No newline at end of file
+on the project JIRA](https://issues.apache.org/jira/projects/ARROW/issues).
+
+## Compute Window Aggregates
+
+Arrow does not support window functions, which can be problematic when applying an aggregate like `mean()` on a grouped table or within a rowwise operation like `filter()`:
+
+```{r, arrow_window_aggregate}
+arrow_table(starwars) %>%
+  select(1:4) %>%
+  filter(!is.na(hair_color)) %>%
+  group_by(hair_color) %>%
+  filter(height < mean(height, na.rm = TRUE))
+```
+
+Arrow pulls the data into R, but for large tables, this sacrifices performance. You can perform these window aggregate operations on Arrow tables by:
+
+- Computing the aggregation separately, and join the result
+- Passing the data to DuckDB

Review Comment:
   In terms of following the formula used in other recipes, this would be better suited in the "discussion" section



##########
r/content/tables.Rmd:
##########
@@ -317,4 +317,52 @@ the development version of the Arrow R package had been associated with their op
 classes.  However, as the Arrow C++ library's functionality extends, compute 
 functions may be added which do not yet have an R binding.  If you find a C++ 
 compute function which you wish to use from the R package, please [open an issue
-on the project JIRA](https://issues.apache.org/jira/projects/ARROW/issues).
\ No newline at end of file
+on the project JIRA](https://issues.apache.org/jira/projects/ARROW/issues).

Review Comment:
   Mind also updating this to refer to GH issues instead of JIRA?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org