You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by th...@apache.org on 2023/05/26 08:09:20 UTC

[arrow] branch main updated: GH-35779: [R][Documentation] Document workaround for window-like functionality (#35702)

This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 77a7130509 GH-35779: [R][Documentation] Document workaround for window-like functionality (#35702)
77a7130509 is described below

commit 77a71305090fcebcc87fa5b8c7dba1398cef9f68
Author: David Greiss <dg...@users.noreply.github.com>
AuthorDate: Fri May 26 04:09:08 2023 -0400

    GH-35779: [R][Documentation] Document workaround for window-like functionality (#35702)
    
    Issue #29537 describes how to perform an implicit window function.
    
    It was discussed on the mailing list: https://lists.apache.org/thread/b16ghtb8q9hyl64ks3dp9ftm7pvlnsdk to document this operation in the vignette. It's still not clear if this is a preferred way to apply these operations, but there is potential for significant performance on large data sets
    
    * Closes: #35779
    
    Lead-authored-by: David Greiss <dg...@users.noreply.github.com>
    Co-authored-by: eitsupi <50...@users.noreply.github.com>
    Signed-off-by: Nic Crane <th...@gmail.com>
---
 r/vignettes/data_wrangling.Rmd | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/r/vignettes/data_wrangling.Rmd b/r/vignettes/data_wrangling.Rmd
index 129f462ece..bad1d4bd58 100644
--- a/r/vignettes/data_wrangling.Rmd
+++ b/r/vignettes/data_wrangling.Rmd
@@ -165,6 +165,33 @@ sw2 %>%
   transmute(name, height, mass, res = residuals(lm(mass ~ height)))
 ```
 
+Because window functions are not supported, computing an aggregation like `mean()` on a grouped table or within a rowwise opertation like `filter()`  is not supported:
+
+```{r}
+sw %>%
+  select(1:4) %>%
+  filter(!is.na(hair_color)) %>%
+  group_by(hair_color) %>%
+  filter(height < mean(height, na.rm = TRUE))
+```
+
+This operation can be accomplished in arrow by computing the aggregation separately, for example within a join operation: 
+
+```{r}
+sw %>%
+  select(1:4) %>%
+  filter(!is.na(hair_color)) %>%
+  left_join(
+    sw %>%
+      group_by(hair_color) %>%
+      summarize(mean_height = mean(height, na.rm = TRUE))
+    ) %>%
+  filter(height < mean_height) %>%
+  select(!mean_height) %>%
+  collect()
+```
+
+
 ## Further reading
 
 - To learn more about multi-file datasets, see the [dataset article](./dataset.html).