You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/12/12 16:05:00 UTC
[jira] [Updated] (ARROW-17361) [R] dplyr::summarize fails with division when divisor is a variable
[ https://issues.apache.org/jira/browse/ARROW-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dewey Dunnington updated ARROW-17361:
-------------------------------------
Fix Version/s: 11.0.0
> [R] dplyr::summarize fails with division when divisor is a variable
> -------------------------------------------------------------------
>
> Key: ARROW-17361
> URL: https://issues.apache.org/jira/browse/ARROW-17361
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 8.0.0
> Reporter: Oliver Reiter
> Priority: Minor
> Labels: aggregation, dplyr
> Fix For: 11.0.0
>
>
> Hello,
> I found this odd behaviour when trying to compute an aggregate with dplyr::summarize: When I want to use a pre-defined variable to do a divison while aggregating, the execution fails with 'unsupported expression'. When I the value of the variable as is in the aggregation, it works.
>
> See below:
>
> {code:java}
> library(dplyr)
> library(arrow)
> small_dataset <- tibble::tibble(
> ## x = rep(c("a", "b"), each = 5),
> y = rep(1:5, 2)
> )
> ## convert "small_dataset" into a ...dataset
> tmpdir <- tempfile()
> dir.create(tmpdir)
> write_dataset(small_dataset, tmpdir)
> ## works
> open_dataset(tmpdir) %>%
> summarize(value = sum(y) / 10) %>%
> collect()
> ## fails
> scale_factor <- 10
> open_dataset(tmpdir) %>%
> summarize(value = sum(y) / scale_factor) %>%
> collect()
> #> Fehler: Error in summarize_eval(names(exprs)[i],
> #> exprs[[i]], ctx, length(.data$group_by_vars) > :
> # Expression sum(y)/scale_factor is not an aggregate
> # expression or is not supported in Arrow
> # Call collect() first to pull data into R.
> {code}
> I was not sure how to name this issue/bug (if it is one), so if there is a clearer, more descriptive title you're welcome to adjust.
>
> Thanks for your work!
>
> Oliver
>
> {code:java}
> > arrow_info()
> Arrow package version: 8.0.0
> Capabilities:
>
> dataset TRUE
> substrait FALSE
> parquet TRUE
> json TRUE
> s3 TRUE
> utf8proc TRUE
> re2 TRUE
> snappy TRUE
> gzip TRUE
> brotli TRUE
> zstd TRUE
> lz4 TRUE
> lz4_frame TRUE
> lzo FALSE
> bz2 TRUE
> jemalloc TRUE
> mimalloc TRUE
> Memory:
>
> Allocator jemalloc
> Current 64 bytes
> Max 41.25 Kb
> Runtime:
>
> SIMD Level avx2
> Detected SIMD Level avx2
> Build:
>
> C++ Library Version 8.0.0
> C++ Compiler GNU
> C++ Compiler Version 12.1.0 {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)