You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/03/15 20:52:00 UTC
[jira] [Resolved] (ARROW-11925) [R] Add `between` method for
arrow_dplyr_query
[ https://issues.apache.org/jira/browse/ARROW-11925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson resolved ARROW-11925.
-------------------------------------
Resolution: Fixed
Issue resolved by pull request 9674
[https://github.com/apache/arrow/pull/9674]
> [R] Add `between` method for arrow_dplyr_query
> ----------------------------------------------
>
> Key: ARROW-11925
> URL: https://issues.apache.org/jira/browse/ARROW-11925
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Reporter: Sam Albers
> Assignee: Sam Albers
> Priority: Minor
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
> Would you consider a PR to add a between method for `arrow_dplyr_query` objects? Even something implemented directly in R harnesses the arrow speed. Here is what I am thinking:
> Typical usage of `between`:
>
> {code:java}
> library(dplyr)
> library(arrow)
> iris %>% filter(between(Petal.Length, 1, 1.1)){code}
>
> Here is a mocked up version of the method:
>
> {code:java}
> between_mock <- function(x, left, right) {
> if (length(left) != 1) {
> rlang::abort("`left` must be length 1")
> }
> if (length(right) != 1) {
> rlang::abort("`right` must be length 1")
> }x >= left & x <= right
> }{code}
> I think because `dplyr` uses C++ to efficiently do this, `between` doesn't work out of the box:
> {code:java}
> open_dataset("nyc-taxi", partitioning = "year") %>%
> filter(year == 2014) %>%
> select(year, fare_amount) %>%
> filter(between(fare_amount, 10, 11)) %>%
> collect()
> Error: Filter expression not supported for Arrow Datasets: between(fare_amount, 10, 11)
> Call collect() first to pull data into R.
> In addition: Warning message:
> between() called on numeric vector with S3 class
> Backtrace:
> x
> 1. +-[ `%>%`(...) ]
> 2. +-[ dplyr::collect(...) ]
> 3. +-[ dplyr::filter(...) ]
> 4. \-arrow:::filter.arrow_dplyr_query(...){code}
> But even my simple implementation works fine:
> {code:java}
> open_dataset("nyc-taxi", partitioning = "year") %>%
> filter(year == 2014) %>%
> select(year, fare_amount) %>%
> filter(between_mock(fare_amount, 10, 11)) %>%
> collect() {code}
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)