You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by th...@apache.org on 2023/05/26 08:09:20 UTC
[arrow] branch main updated: GH-35779: [R][Documentation] Document workaround for window-like functionality (#35702)
This is an automated email from the ASF dual-hosted git repository.
thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 77a7130509 GH-35779: [R][Documentation] Document workaround for window-like functionality (#35702)
77a7130509 is described below
commit 77a71305090fcebcc87fa5b8c7dba1398cef9f68
Author: David Greiss <dg...@users.noreply.github.com>
AuthorDate: Fri May 26 04:09:08 2023 -0400
GH-35779: [R][Documentation] Document workaround for window-like functionality (#35702)
Issue #29537 describes how to perform an implicit window function.
It was discussed on the mailing list: https://lists.apache.org/thread/b16ghtb8q9hyl64ks3dp9ftm7pvlnsdk to document this operation in the vignette. It's still not clear if this is a preferred way to apply these operations, but there is potential for significant performance on large data sets
* Closes: #35779
Lead-authored-by: David Greiss <dg...@users.noreply.github.com>
Co-authored-by: eitsupi <50...@users.noreply.github.com>
Signed-off-by: Nic Crane <th...@gmail.com>
---
r/vignettes/data_wrangling.Rmd | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/r/vignettes/data_wrangling.Rmd b/r/vignettes/data_wrangling.Rmd
index 129f462ece..bad1d4bd58 100644
--- a/r/vignettes/data_wrangling.Rmd
+++ b/r/vignettes/data_wrangling.Rmd
@@ -165,6 +165,33 @@ sw2 %>%
transmute(name, height, mass, res = residuals(lm(mass ~ height)))
```
+Because window functions are not supported, computing an aggregation like `mean()` on a grouped table or within a rowwise opertation like `filter()` is not supported:
+
+```{r}
+sw %>%
+ select(1:4) %>%
+ filter(!is.na(hair_color)) %>%
+ group_by(hair_color) %>%
+ filter(height < mean(height, na.rm = TRUE))
+```
+
+This operation can be accomplished in arrow by computing the aggregation separately, for example within a join operation:
+
+```{r}
+sw %>%
+ select(1:4) %>%
+ filter(!is.na(hair_color)) %>%
+ left_join(
+ sw %>%
+ group_by(hair_color) %>%
+ summarize(mean_height = mean(height, na.rm = TRUE))
+ ) %>%
+ filter(height < mean_height) %>%
+ select(!mean_height) %>%
+ collect()
+```
+
+
## Further reading
- To learn more about multi-file datasets, see the [dataset article](./dataset.html).